The ability to understand and interact with user interfaces by reading screenshots and generating commands to control applications or websites.
Function calling, structured output, agent-style tool orchestration