Qwen2.5 32B Instruct sits in a productive middle ground — large enough to handle complex reasoning and nuanced instruction-following, compact enough to run on accessible hardware. It handles multilingual tasks with particular fluency, especially in Chinese and English, and tends to follow structured prompts reliably. Like many instruction-tuned models, it can occasionally over-explain or hedge, but its outputs are generally coherent and well-organized.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| MMLU-Pro | 51.9 | accuracy | 26d ago |
| BBH | 56.5 | accuracy | 26d ago |
| IFEval | 83.5 | accuracy | 26d ago |
| GPQA Diamond | 11.7 | accuracy | 26d ago |
| MuSR | 13.5 | accuracy | 26d ago |
| MATH | 62.5 | accuracy | 26d ago |