Chatbot Arena

LMSYS Chatbot Arena

conversationScore: Elo rating (~600-1400)1 model scored

About

Live crowd-sourced evaluation where users chat with two anonymous models side-by-side and vote for the better response. Produces Elo ratings

Methodology

Users submit prompts to two anonymous models and select which response they prefer (or tie). Votes are aggregated into Bradley-Terry (Elo) ratings. Over 1M+ human votes collected. Categories include overall, coding, math, hard prompts, and more.

Paper Dataset Website

Model Leaderboard

Shows open-weight models only. Commercial API models (GPT-4o, Claude, Gemini) are not submitted to the Open LLM Leaderboard — their scores come from provider-reported benchmarks.

#	Model	Score
1	GPT-4o	1,285