You can make GUI agents 3x faster by intelligently pruning screenshots and history instead of compressing everything uniformly.
This paper solves a major speed problem for AI agents that control computer screens by smartly removing unnecessary information from screenshots and action history. Instead of treating all parts of an image equally, it keeps important interactive elements while discarding redundant details, achieving 3.3x faster processing with minimal accuracy loss.