OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Zehao Li, Zhenyu Wu, Yibo Zhao, Bowen Yang, Jingjing Xie et al.|March 19, 2026arXiv

Key Takeaway

Breaking reward evaluation into smaller, verifiable steps with multiple reviewers produces more reliable feedback for training GUI agents, improving task success by 10% in online learning scenarios.

Summary

OS-Themis is a reward evaluation system for GUI agents that breaks down task trajectories into verifiable milestones and uses multiple reviewers to judge whether agents completed tasks correctly. This approach improves both the accuracy of reward signals and the performance of agents trained with reinforcement learning on mobile and desktop interfaces.

agents evaluation training

Key Terms

gui-agent reward-model reinforcement-learning trajectory mechanism-linked-evidence