The ability to process and comprehend spoken language or audio signals, converting them into meaningful interpretations or responses.
Quality of vision, audio, and image understanding (distinct from modality support)