Online Learning and Equilibrium Computation with Ranking Feedback

Mingyang Liu, Yongshan Chen, Zhiyuan Fan, Gabriele Farina, Asuman Ozdaglar et al.|March 19, 2026arXiv

Key Takeaway

Learning from rankings instead of numeric feedback is fundamentally harder, but becomes tractable when the environment changes slowly—with applications to game theory and LLM routing systems.

Summary

This paper studies online learning when you only get ranking feedback (like "action A is better than B") instead of numeric scores. The researchers show when this is impossible and develop algorithms that work well when utility changes slowly. They prove these algorithms help players reach fair game equilibria and test them on routing large language models.

reasoning agents

Key Terms

regret plackett-luce-model coarse-correlated-equilibrium bandit-feedback