The scrappy minimalist of the Llama family — it runs on modest hardware and responds quickly, but its small parameter count means it struggles with complex reasoning, nuanced instruction-following, and longer context tasks. It tends to work best when the task is narrow and well-defined. Think of it as a fast, lightweight workhorse rather than a deep thinker.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| IFEval | 14.8 | accuracy | 26d ago |
| MMLU-Pro | 2.3 | accuracy | 26d ago |
| MuSR | 2.6 | accuracy | 26d ago |
| MATH | 1.2 | accuracy | 26d ago |
| BBH | 4.4 | accuracy | 26d ago |
| GPQA Diamond | 0.0 | accuracy | 26d ago |