A model that combines visual perception, language understanding, and robotic action generation to interpret instructions and control robot movements.
Adhering to complex, structured, or constrained instructions
Function calling, structured output, agent-style tool orchestration