Reinforcement learning can transform passive video understanding models into active task evaluators by training them to generate explicit reasoning about progress toward goals—enabling smaller models to outperform much larger ones on robot manipulation tasks.
This paper introduces PRIMO R1, a 7B video AI model that learns to actively evaluate robot manipulation progress by using reinforcement learning to generate step-by-step reasoning. Unlike standard models that passively recognize what's happening, PRIMO R1 compares current robot states to task goals and predicts failures, achieving better accuracy than much larger models on robotic tasks.