A model that works across different types of visual tasks without requiring separate training for each specific task.
Adhering to complex, structured, or constrained instructions