The ability to connect and reason about information from different input types (like audio and video) together to draw conclusions.
Quality of vision, audio, and image understanding (distinct from modality support)
Multi-step reasoning, logic puzzles, mathematical problem-solving