TinyLlama punches at its weight class with surprising coherence for a 1.1B parameter model, handling casual conversation and simple instruction-following reasonably well. It runs on modest hardware — even CPU-only setups — making it accessible where larger models simply won't fit. Expect limited reasoning depth and knowledge gaps compared to larger siblings, but it stays functional for lightweight tasks.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| MMLU-Pro | 1.1 | accuracy | 26d ago |
| MuSR | 4.3 | accuracy | 26d ago |
| BBH | 4.0 | accuracy | 26d ago |
| MATH | 1.5 | accuracy | 26d ago |
| IFEval | 6.0 | accuracy | 26d ago |
| GPQA Diamond | 0.0 | accuracy | 26d ago |