Breaking reward evaluation into smaller, verifiable steps with multiple reviewers produces more reliable feedback for training GUI agents, improving task success by 10% in online learning scenarios.
OS-Themis is a reward evaluation system for GUI agents that breaks down task trajectories into verifiable milestones and uses multiple reviewers to judge whether agents completed tasks correctly. This approach improves both the accuracy of reward signals and the performance of agents trained with reinforcement learning on mobile and desktop interfaces.