Robots can now autonomously refine their social interactions by using VLMs to evaluate and improve their own behavior plans, eliminating the need for predefined motions or constant human guidance.
This paper presents CRISP, a framework that lets robots automatically improve their social behaviors by critiquing and replanning their own actions. Using a vision-language model as a virtual social critic, the system generates robot motions, evaluates them for social appropriateness, and iteratively refines them—all without human feedback.