AI systems that understand both images and text, allowing them to answer questions about images or describe what they see.