The ability of an AI model to understand and reason about multiple types of input data (like images and text) simultaneously.
Quality of vision, audio, and image understanding (distinct from modality support)
Multi-step reasoning, logic puzzles, mathematical problem-solving