You don't need to model the environment to build an optimal AI agent—learning action values directly can be just as powerful.
This paper introduces AIQI, the first AI agent that learns optimal behavior without building an explicit model of its environment. Instead of predicting how the world works, it directly learns which actions produce the best outcomes. This is a theoretical breakthrough showing that model-free approaches can match the performance of model-based agents in general reinforcement learning.