The spatial coordinates or locations of text elements within a document, used to understand where words and phrases appear on the page.
Quality of vision, audio, and image understanding (distinct from modality support)