A lightweight model that punches at the very edge of what's possible with half a billion parameters. It handles simple instructions and short conversational exchanges reasonably well, but runs out of steam quickly on complex reasoning or multi-step tasks. Think of it as the intern who's great at quick lookups but needs supervision on anything nuanced.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| MuSR | 0.9 | accuracy | 26d ago |
| MMLU-Pro | 7.7 | accuracy | 26d ago |
| GPQA Diamond | 1.0 | accuracy | 26d ago |
| BBH | 8.4 | accuracy | 26d ago |
| MATH | 0.0 | accuracy | 26d ago |
| IFEval | 30.7 | accuracy | 26d ago |