A system designed to understand and work with multiple types of content, such as text and images, even if it only processes one type directly.
Quality of vision, audio, and image understanding (distinct from modality support)