Back to Basics: Revisiting ASR in the Age of Voice Agents

Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee et al.|March 26, 2026arXiv

Key Takeaway

Speech recognition systems hallucinate false content under degraded audio, creating safety risks for voice agents. You need diagnostic testing across real-world conditions, not just benchmark scores, to know when and where your ASR will fail.

Summary

This paper reveals that speech recognition systems fail in real-world voice agents despite high benchmark scores. The authors created WildASR, a multilingual test set from real human speech that measures robustness across environmental noise, speaker differences, and languages.

evaluation safety multimodal

Key Terms

speech-representation-model hallucination multilingual-capability benchmark-dataset