From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation

Yibin Liu, Yaxing Lyu, Daqi Gao, Zhixuan Liang, Weiliang Tang et al.|March 16, 2026arXiv

Key Takeaway

Reinforcement learning can transform passive video understanding models into active task evaluators by training them to generate explicit reasoning about progress toward goals—enabling smaller models to outperform much larger ones on robot manipulation tasks.

Summary

This paper introduces PRIMO R1, a 7B video AI model that learns to actively evaluate robot manipulation progress by using reinforcement learning to generate step-by-step reasoning. Unlike standard models that passively recognize what's happening, PRIMO R1 compares current robot states to task goals and predicts failures, achieving better accuracy than much larger models on robotic tasks.

reasoning agents multimodal

Key Terms

chain-of-thought reinforcement-learning vision-language-model zero-shot-generalization process-reward-model