When building embodied AI systems, measure what actually matters: task completion time, motion quality, and energy use—not just model size or inference speed. Optimizing the wrong metrics can make robots perform worse in practice.
This paper shows that traditional efficiency metrics (parameters, computation) for vision-language-action robots don't match real-world performance. The researchers measured actual robotic execution—task time, motion smoothness, energy use—and found that methods optimizing for conventional metrics often make robots move worse or take longer, even when task success stays the same.