An AI model that understands both images and text, allowing it to answer questions about images or describe what it sees.