Tests models on research-level scientific programming problems drawn from real scientific papers across physics, chemistry, biology, and mathematics
Problems require implementing algorithms described in scientific literature, then verifying correctness against test cases. Covers numerical methods, simulations, and domain-specific computations across STEM fields.
No model scores recorded yet
Scores will appear here as the pipeline processes model data