Visual information extracted directly from individual pixels in an image, used to understand the precise positioning and appearance of elements on a page.
Quality of vision, audio, and image understanding (distinct from modality support)