Qwen2.5 32B operates like a methodical generalist — comfortable across coding, reasoning, and multilingual tasks, with notably strong handling of Chinese and English. At 32B parameters it sits in a practical middle ground: capable enough to tackle complex multi-step problems, yet small enough to run on a single high-end GPU. It can occasionally be verbose when a concise answer would serve better.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| BBH | 54.0 | accuracy | 26d ago |
| MuSR | 22.7 | accuracy | 26d ago |
| GPQA Diamond | 21.6 | accuracy | 26d ago |
| IFEval | 40.8 | accuracy | 26d ago |
| MMLU-Pro | 53.4 | accuracy | 26d ago |
| MATH | 35.6 | accuracy | 26d ago |