Speech recognition systems hallucinate false content under degraded audio, creating safety risks for voice agents. You need diagnostic testing across real-world conditions, not just benchmark scores, to know when and where your ASR will fail.
This paper reveals that speech recognition systems fail in real-world voice agents despite high benchmark scores. The authors created WildASR, a multilingual test set from real human speech that measures robustness across environmental noise, speaker differences, and languages.