Which AI model is actually the best? We aggregate 20+ benchmarks so you don't have to.
Tired of cherry-picked benchmarks and marketing hype? Showdown provides transparent, community-maintained rankings of AI language models across real-world categories:
All data is open. All methodology is transparent. All contributions are welcome.
Visit showdown.best to explore the rankings.
Want to run it locally?
git clone https://github.com/verseles/showdown.git
cd showdown
npm install
npm run dev
We aggregate scores from 20+ industry benchmarks, weighted by practical importance:
| Category | Weight | What it measures |
|---|---|---|
| Coding | 25% | Real GitHub issues, live coding challenges |
| Reasoning | 25% | PhD science questions, novel problem solving |
| Agents & Tools | 18% | API usage, multi-step tasks, browser automation |
| Conversation | 12% | Creative writing, following complex instructions |
| Math | 10% | Competition math, word problems |
| Multimodal | 7% | Understanding images, charts, diagrams |
| Multilingual | 3% | Performance across languages |
Scoring:
When benchmark data is missing, we use two estimation methods:
Superior Model Imputation (green *): For "thinking" variants, we calculate their expected superiority over the base model using benchmarks where both have real data, then apply that ratio to missing benchmarks. More reliable since it's based on real performance differences.
Category Average (yellow *): Falls back to averaging other benchmarks in the same category. Less reliable but ensures all models can be compared.
Note: Estimated values are clearly marked and should be replaced with real data when available. See UPDATE.md for details.
Open an issue with the correct value and source.
Open an issue with available benchmark scores.
data/showdown.json./precommit.sh to validate your changesRankings aggregate data from trusted sources:
AGPL-3.0 - Keep it open!
Built with Svelte. Hosted on Cloudflare. Made for the community.