Qwen2.5 14B Instruct sits in a practical middle ground — large enough to handle nuanced reasoning and multilingual tasks, compact enough to run on consumer hardware. It follows instructions reliably and handles structured outputs like JSON or code with reasonable precision. Its Chinese-English bilingual capabilities are notably strong, reflecting Alibaba's training priorities, though it can occasionally be verbose when brevity is called for.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| MMLU-Pro | 43.4 | accuracy | 26d ago |
| MuSR | 10.2 | accuracy | 26d ago |
| BBH | 48.4 | accuracy | 26d ago |
| IFEval | 81.6 | accuracy | 26d ago |
| GPQA Diamond | 9.6 | accuracy | 26d ago |
| MATH | 54.8 | accuracy | 26d ago |