Continuously updated coding benchmark using new competitive programming problems from LeetCode, AtCoder, and Codeforces to prevent contamination
Collects new competitive programming problems published after training cutoff dates of evaluated models. Problems include code generation, self-repair, code execution prediction, and test output prediction. Automatically refreshed to avoid benchmark contamination.
No model scores recorded yet
Scores will appear here as the pipeline processes model data