GPT-4o is OpenAI's multimodal workhorse — comfortable switching between analyzing images and generating text without missing a beat. It handles a wide range of tasks with consistent reliability, from parsing complex documents to interpreting visual content. Its large 128k context window means it can hold lengthy conversations or process substantial documents without losing the thread.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| HumanEval | 90.2 | pass@1 | 1y ago |
| MT-Bench | 9.3 | GPT-4-judge | 1y ago |
| MATH | 76.6 | 4-shot | 1y ago |
| Chatbot Arena | 1285.0 | Bradley-Terry Elo | 1y ago |
| Humanity's Last Exam | 9.4 | 0-shot | 1y ago |
| GPQA Diamond | 53.6 | 0-shot-CoT | 1y ago |
| MMLU | 88.7 | 5-shot | 1y ago |
| HumanEval+ | 86.6 | pass@1 | 1y ago |