A task that identifies or confirms whether audio was spoken by a specific person, using characteristics unique to that person's voice.
Quality of vision, audio, and image understanding (distinct from modality support)