A compact, nimble model that punches above its weight for everyday language tasks — following instructions, drafting short-form text, and handling conversational exchanges. Its small footprint makes it practical for edge deployment or resource-constrained environments, though it noticeably struggles with complex multi-step reasoning or deep domain knowledge that larger models handle more comfortably.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| GPQA Diamond | 3.8 | accuracy | 26d ago |
| BBH | 24.1 | accuracy | 26d ago |
| IFEval | 73.9 | accuracy | 26d ago |
| MMLU-Pro | 24.4 | accuracy | 26d ago |
| MuSR | 1.4 | accuracy | 26d ago |
| MATH | 17.7 | accuracy | 26d ago |