GPT-2 Large is a text completion engine from an earlier era of language models — it predicts the next token based on patterns learned from web text, without instruction-following or chat capabilities. It generates fluent prose and can continue passages convincingly, but has no concept of following instructions or answering questions deliberately. Think of it as a statistical storyteller rather than an assistant.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| BBH | 3.3 | accuracy | 26d ago |
| IFEval | 20.5 | accuracy | 26d ago |
| MATH | 1.2 | accuracy | 26d ago |
| MMLU-Pro | 1.6 | accuracy | 26d ago |
| MuSR | 5.7 | accuracy | 26d ago |
| GPQA Diamond | 1.2 | accuracy | 26d ago |