LLaVA Architecture

architecture

A design pattern that connects a vision encoder to a language model, enabling the language model to understand and describe images.

Related Capabilities

Quality of vision, audio, and image understanding (distinct from modality support)