A common mathematical space where different types of data (text and audio) are represented so that related concepts from each type are positioned near each other.
Quality of vision, audio, and image understanding (distinct from modality support)