From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models

Zhuofan Li, Hongkun Yang, Zhenyang Chen, Yangxuan Chen, Yingyan et al.|March 19, 2026arXiv

Key Takeaway

When building embodied AI systems, measure what actually matters: task completion time, motion quality, and energy use—not just model size or inference speed. Optimizing the wrong metrics can make robots perform worse in practice.

Summary

This paper shows that traditional efficiency metrics (parameters, computation) for vision-language-action robots don't match real-world performance. The researchers measured actual robotic execution—task time, motion smoothness, energy use—and found that methods optimizing for conventional metrics often make robots move worse or take longer, even when task success stays the same.

efficiency evaluation applications

Key Terms

vision-language-action-model embodied-efficiency inference-efficiency model-compression token-sparsification