Evaluating Counterfactual Strategic Reasoning in Large Language Models

Dimitrios Georgousis, Maria Lymperaiou, Angeliki Dimitriou, Giorgos Filandrianos, Giorgos Stamou|March 19, 2026arXiv

Key Takeaway

LLMs perform well on familiar games but fail when payoff structures change, suggesting they rely on memorized patterns rather than understanding underlying strategic principles.

Summary

This paper tests whether large language models can genuinely reason about game theory or just memorize patterns. Researchers created modified versions of classic games (Prisoner's Dilemma and Rock-Paper-Scissors) with different payoffs and labels to see if LLMs could adapt their strategy.

reasoning evaluation

Key Terms

strategic-reasoning counterfactual-explanation structural-generalization incentive-sensitivity