A shared numerical space where different types of data (such as audio and text) are represented together, allowing the model to find relationships between them.
Quality of vision, audio, and image understanding (distinct from modality support)