A tiny, transparent model built more for studying than shipping. Pythia 160M is part of a carefully controlled research suite, meaning its training process is unusually well-documented — researchers can inspect checkpoints at every stage of training. At this scale, it handles simple text completion but struggles with anything requiring reasoning or factual depth.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| IFEval | 18.2 | accuracy | 26d ago |
| GPQA Diamond | 1.1 | accuracy | 26d ago |
| MATH | 0.9 | accuracy | 26d ago |
| BBH | 2.2 | accuracy | 26d ago |
| MuSR | 10.7 | accuracy | 26d ago |
| MMLU-Pro | 1.3 | accuracy | 26d ago |