Coding

Best models for code generation and debugging.

Updated June 11, 2026

Best models for code generation and debugging.

# Model Score AI Index Context Input / 1M Output / 1M Caps
1Anthropic: Claude Fable 5anthropic New Top Pick94.764.91M$10.00$50.00
2Anthropic: Claude Opus 4.8anthropic New Top Pick In-House Pick92.461.41M$5.00$25.00
3Google: Gemini 3.1 Pro Previewgoogle Best for Agents91.757.21M$2.00$12.00
4OpenAI: GPT-5.5openai Top Pick88.860.21.1M$5.00$30.00
5Anthropic: Claude Opus 4.7anthropic88.357.31M$5.00$25.00
6Anthropic: Claude Sonnet 4.6anthropic In-House Pick84.451.71M$3.00$15.00
7Qwen: Qwen3.7 Maxqwen New83.956.61M$1.25$3.75
8OpenAI: GPT-5.3-Codexopenai83.553.6400K$1.75$14.00
9Google: Gemini 3 Flash Previewgoogle82.5351M$0.5000$3.00
10OpenAI: GPT-5.1-Codexopenai8243.1400K$1.25$10.00
11DeepSeek: DeepSeek V3.1deepseek8227.7164K$0.2100$0.7900
12Microsoft: Phi 4microsoft8210.416K$0.0650$0.1400
13o1-previewOpenAI8223.7$16.50$66.00
14Mistral: Devstral 2 2512mistralai8222262K$0.4000$2.00
15Mistral: Devstral Small 1.1mistralai8215.2131K$0.1000$0.3000
16Google: Gemma 4 31Bgoogle Updated8239.2262K$0.1200$0.3500
17OpenAI: GPT-5.1-Codex-Miniopenai8238.6400K$0.2500$2.00
18Qwen: Qwen3 30B A3Bqwen8215.3131K$0.1200$0.5000
19GPT-5.5 (high)OpenAI Best for Coding8258.9$5.00$30.00
20OpenAI: gpt-oss-120bopenai8224.5131K$0.0390$0.1800
21DeepSeek: DeepSeek V3deepseek8216.5131K$0.2002$0.8001
22o1-miniOpenAI8220.4FreeFree
23Nex AGI: DeepSeek V3.1 Nex N1nex-agi8228.1131K$0.1350$0.5000
24xAI: Grok 4x-ai8241.5256K$3.00$15.00
25Qwen: Qwen3.6 Plusqwen82501M$0.3250$1.95
26Google: Gemini 3.5 Flashgoogle New8243.31M$1.50$9.00
27Kwaipilot: KAT-Coder-Pro V1kwaipilot8236256K$0.2070$0.8280
28OpenAI: o3openai8238.4200K$2.00$8.00
29Muse SparkMeta8252.2FreeFree
30OpenAI: gpt-oss-20bopenai8220.8131K$0.0290$0.1400
31Meta: Llama 3.3 70B Instruct (free)meta-llama8214.5131KFreeFree
32Gemini 1.5 Pro (May ’24)Google8212FreeFree
33DeepSeek: DeepSeek V3.2 Specialedeepseek8229.4164K$0.2870$0.4310
34TNG: DeepSeek R1T2 Chimeratngtech8227164K$0.3000$1.10
35Z.ai: GLM 4.6z-ai8232.5203K$0.4300$1.74
36OpenAI: o4 Miniopenai8233.1200K$1.10$4.40
37Mistral Large 3Mistral8222.8$0.5000$1.50
38Anthropic: Claude Opus 4.6anthropic8252.91M$5.00$25.00
39Qwen2.5 Coder 32B Instructqwen8212.9128K$0.6600$1.00
40Magistral Small 1Mistral8216.8FreeFree
41DeepSeek: DeepSeek V3.2deepseek8241.7131K$0.2288$0.3432
42Google: Gemini 2.5 Flashgoogle82271M$0.3000$2.50
43Xiaomi: MiMo-V2.5-Proxiaomi8235.61M$0.4350$0.8700
44GPT-5.5 Instant (May 2026)OpenAI8241.8$5.00$30.00
45Anthropic: Claude Sonnet 4.5anthropic8237.11M$3.00$15.00
46Devstral Small 2Mistral8219.5FreeFree
47Qwen: Qwen3 Coder Nextqwen8228.3262K$0.1100$0.8000
48Qwen: Qwen3 Coder 30B A3B Instructqwen8220160K$0.0700$0.2700
49Anthropic: Claude 3.5 Sonnetanthropic8215.9200K$6.00$30.00
50Magistral Medium 1Mistral8218.8FreeFree
#1NewTop Pick94.7
Anthropic: Claude Fable 5anthropic
AI 64.91M ctx$10.00/M in
#2NewTop PickIn-House Pick92.4
Anthropic: Claude Opus 4.8anthropic
AI 61.41M ctx$5.00/M in
#3Best for Agents91.7
Google: Gemini 3.1 Pro Previewgoogle
AI 57.21M ctx$2.00/M in
#4Top Pick88.8
OpenAI: GPT-5.5openai
AI 60.21.1M ctx$5.00/M in
#588.3
Anthropic: Claude Opus 4.7anthropic
AI 57.31M ctx$5.00/M in
#6In-House Pick84.4
Anthropic: Claude Sonnet 4.6anthropic
AI 51.71M ctx$3.00/M in
#7New83.9
Qwen: Qwen3.7 Maxqwen
AI 56.61M ctx$1.25/M in
#883.5
OpenAI: GPT-5.3-Codexopenai
AI 53.6400K ctx$1.75/M in
#982.5
Google: Gemini 3 Flash Previewgoogle
AI 351M ctx$0.5000/M in
#1082
OpenAI: GPT-5.1-Codexopenai
AI 43.1400K ctx$1.25/M in
#1182
DeepSeek: DeepSeek V3.1deepseek
AI 27.7164K ctx$0.2100/M in
#1282
Microsoft: Phi 4microsoft
AI 10.416K ctx$0.0650/M in
#1382
o1-previewOpenAI
AI 23.7$16.50/M in
#1482
Mistral: Devstral 2 2512mistralai
AI 22262K ctx$0.4000/M in
#1582
Mistral: Devstral Small 1.1mistralai
AI 15.2131K ctx$0.1000/M in
#1682
Google: Gemma 4 31Bgoogle
AI 39.2262K ctx$0.1200/M in
#1782
OpenAI: GPT-5.1-Codex-Miniopenai
AI 38.6400K ctx$0.2500/M in
#1882
Qwen: Qwen3 30B A3Bqwen
AI 15.3131K ctx$0.1200/M in
#19Best for Coding82
GPT-5.5 (high)OpenAI
AI 58.9$5.00/M in
#2082
OpenAI: gpt-oss-120bopenai
AI 24.5131K ctx$0.0390/M in
#2182
DeepSeek: DeepSeek V3deepseek
AI 16.5131K ctx$0.2002/M in
#2282
o1-miniOpenAI
AI 20.4Free/M in
#2382
Nex AGI: DeepSeek V3.1 Nex N1nex-agi
AI 28.1131K ctx$0.1350/M in
#2482
xAI: Grok 4x-ai
AI 41.5256K ctx$3.00/M in
#2582
Qwen: Qwen3.6 Plusqwen
AI 501M ctx$0.3250/M in
#26New82
Google: Gemini 3.5 Flashgoogle
AI 43.31M ctx$1.50/M in
#2782
Kwaipilot: KAT-Coder-Pro V1kwaipilot
AI 36256K ctx$0.2070/M in
#2882
OpenAI: o3openai
AI 38.4200K ctx$2.00/M in
#2982
Muse SparkMeta
AI 52.2Free/M in
#3082
OpenAI: gpt-oss-20bopenai
AI 20.8131K ctx$0.0290/M in
#3182
Meta: Llama 3.3 70B Instruct (free)meta-llama
AI 14.5131K ctxFree/M in
#3282
Gemini 1.5 Pro (May ’24)Google
AI 12Free/M in
#3382
DeepSeek: DeepSeek V3.2 Specialedeepseek
AI 29.4164K ctx$0.2870/M in
#3482
TNG: DeepSeek R1T2 Chimeratngtech
AI 27164K ctx$0.3000/M in
#3582
Z.ai: GLM 4.6z-ai
AI 32.5203K ctx$0.4300/M in
#3682
OpenAI: o4 Miniopenai
AI 33.1200K ctx$1.10/M in
#3782
Mistral Large 3Mistral
AI 22.8$0.5000/M in
#3882
Anthropic: Claude Opus 4.6anthropic
AI 52.91M ctx$5.00/M in
#3982
Qwen2.5 Coder 32B Instructqwen
AI 12.9128K ctx$0.6600/M in
#4082
Magistral Small 1Mistral
AI 16.8Free/M in
#4182
DeepSeek: DeepSeek V3.2deepseek
AI 41.7131K ctx$0.2288/M in
#4282
Google: Gemini 2.5 Flashgoogle
AI 271M ctx$0.3000/M in
#4382
Xiaomi: MiMo-V2.5-Proxiaomi
AI 35.61M ctx$0.4350/M in
#4482
GPT-5.5 Instant (May 2026)OpenAI
AI 41.8$5.00/M in
#4582
Anthropic: Claude Sonnet 4.5anthropic
AI 37.11M ctx$3.00/M in
#4682
Devstral Small 2Mistral
AI 19.5Free/M in
#4782
Qwen: Qwen3 Coder Nextqwen
AI 28.3262K ctx$0.1100/M in
#4882
Qwen: Qwen3 Coder 30B A3B Instructqwen
AI 20160K ctx$0.0700/M in
#4982
Anthropic: Claude 3.5 Sonnetanthropic
AI 15.9200K ctx$6.00/M in
#5082
Magistral Medium 1Mistral
AI 18.8Free/M in

How we rank AI models

The Design for Online AI Model Leaderboard scores 577 models on a single 0–100 scale built from four weighted dimensions: intelligence (reasoning and knowledge benchmarks), technical capability (coding and tool use), content quality (writing and instruction-following) and value (capability per dollar).

Underlying data is aggregated from the OpenRouter API for pricing and availability, Artificial Analysis for intelligence, coding and agentic indices, and the Hugging Face Open LLM Leaderboard for open-model benchmarks. We refresh these sources daily and layer our own editorial review on top, so a model that benchmarks well but is impractical to deploy will not automatically top the table.

Models are grouped into tiers (Frontier, Professional, Specialist, Efficient, Emerging and Legacy) to make like-for-like comparison easier, and newly released models are flagged so you can see what has just landed.

Leaderboard FAQ

How often is the leaderboard updated?

Pricing, availability and benchmark data are synced daily from our sources, and editorial scores are reviewed whenever a significant new model is released.

How is the overall score calculated?

Each model is graded 0–10 on intelligence, technical capability, content quality and value; those dimensions are weighted and combined into the 0–100 overall score used to rank the table.

Where does the data come from?

From the OpenRouter API, Artificial Analysis and the Hugging Face Open LLM Leaderboard, combined with hands-on editorial testing by the Design for Online team.