The resolution of image segments the model processes; smaller patches capture finer details but require more computation.
Quality of vision, audio, and image understanding (distinct from modality support)