Q-Former

architecture

A lightweight connector module that bridges a frozen image encoder and a language model, translating visual information into a format the language model can understand.

Related Capabilities

Multimodal

Quality of vision, audio, and image understanding (distinct from modality support)

439