The Robot's Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning

Jiyu Lim, Youngwoo Yoon, Kwanghyun Park|March 20, 2026arXiv

Key Takeaway

Robots can now autonomously refine their social interactions by using VLMs to evaluate and improve their own behavior plans, eliminating the need for predefined motions or constant human guidance.

Summary

This paper presents CRISP, a framework that lets robots automatically improve their social behaviors by critiquing and replanning their own actions. Using a vision-language model as a virtual social critic, the system generates robot motions, evaluates them for social appropriateness, and iteratively refines them—all without human feedback.

agents reasoning multimodal

Key Terms

vision-language-model self-refinement iterative-refinement robotic-manipulation