A compact, efficient model that punches above its weight for a 3B parameter system. It handles instruction-following tasks with reasonable coherence and can manage everyday language tasks like summarization and Q&A. At this size, it naturally struggles with complex multi-step reasoning or nuanced tasks where larger siblings would have an edge.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| MATH | 36.8 | accuracy | 26d ago |
| BBH | 25.8 | accuracy | 26d ago |
| IFEval | 64.7 | accuracy | 26d ago |
| GPQA Diamond | 3.0 | accuracy | 26d ago |
| MuSR | 7.6 | accuracy | 26d ago |
| MMLU-Pro | 25.1 | accuracy | 26d ago |