AI Agents

Models optimised for autonomous agent workflows.

Updated June 30, 2026

Models optimised for autonomous agent workflows.

# Model Score AI Index Context Input / 1M Output / 1M Caps
1Anthropic: Claude Fable 5anthropic New Top Pick93.359.91M$10.00$50.00
2Anthropic: Claude Sonnet 5anthropic New In-House Pick9353.41M$2.00$10.00
3Anthropic: Claude Opus 4.8anthropic Top Pick In-House Pick92.355.71M$5.00$25.00
4Google: Gemini 3.1 Pro Previewgoogle89.846.51M$2.00$12.00
5Anthropic: Claude Opus 4.7anthropic89.453.51M$5.00$25.00
6Google: Gemini 3.5 Flashgoogle88.950.21M$1.50$9.00
7OpenAI: GPT-5.5openai Top Pick88.354.81.1M$5.00$30.00
8OpenAI: GPT-5.4openai8751.41.1M$2.50$15.00
9Z.ai: GLM 5.2z-ai New86.151.11M$0.9300$3.00
10Anthropic: Claude Sonnet 4.6anthropic85.147.21M$3.00$15.00
11Qwen: Qwen3.7 Maxqwen83.5461M$1.25$3.75
12Anthropic: Claude Opus 4.6anthropic8343.71M$5.00$25.00
13DeepSeek: DeepSeek V4 Prodeepseek82.144.31M$0.4350$0.8700
14OpenAI: GPT-5openai8231.2400K$1.25$10.00
15Google: Gemini 3 Flash Previewgoogle8237.81M$0.5000$3.00
16Anthropic: Claude Opus 4anthropic8231200K$15.00$75.00
17inclusionAI: Ling-2.6-1T (free)inclusionai8233.6262KFreeFree
18MiniMax: MiniMax M2minimax8228.3205K$0.2550$1.00
19ERNIE 5.0 Thinking PreviewBaidu8221.9FreeFree
20OpenAI: GPT-5 Miniopenai8233400K$0.2500$2.00
21xAI: Grok 4.20x-ai82372M$1.25$2.50
22Xiaomi: MiMo-V2-Flashxiaomi8224.7262K$0.1000$0.3000
23Anthropic: Claude Sonnet 4anthropic8228.91M$3.00$15.00
24Nex AGI: Nex-N2-Pronex-agi New Best for Agents8241262K$0.2500$1.00
25Anthropic: Claude Haiku 4.5anthropic8229.6200K$1.00$5.00
26xAI: Grok 4.20 Multi-Agent Betax-ai8248.52M$2.00$6.00
27Qwen3.6 35B A3B (Non-reasoning)Alibaba8224.2$0.3750$2.25
28Qwen: Qwen3.5 397B A17Bqwen8233.7256K$0.3850$2.45
29OpenAI: gpt-oss-20bopenai8214.9131K$0.0290$0.1400
30Z.ai: GLM 5V Turboz-ai8234.5203K$1.20$4.00
31Google: Gemini 2.5 Pro Preview 05-06google8229.51M$1.25$10.00
32DeepSeek: DeepSeek V4 Flashdeepseek Best Value8240.31M$0.0980$0.1960
33Grok Build 0.1 0616xAI New8239.8$1.00$2.00
34OpenAI: o3 Deep Researchopenai8238.3200K$10.00$40.00
35Qwen: Qwen3.6 35B A3Bqwen8231.6262K$0.1400$1.00
36MiniMax: MiniMax M2.5minimax8233.7205K$0.1200$0.4800
37Anthropic: Claude Opus 4.1anthropic8233.7200K$15.00$75.00
38Z.ai: GLM 5.1z-ai8240.2203K$0.9750$4.30
39OpenAI: o3openai8230.4200K$2.00$8.00
40GPT-5.5 (medium)OpenAI8250.4$5.00$30.00
41OpenAI: o4 Mini Deep Researchopenai8233200K$2.00$8.00
42Z.ai: GLM 5 Turboz-ai8238.1262K$1.20$4.00
43Qwen: Qwen3.6 27Bqwen8237.1262K$0.2850$2.40
44Z.ai: GLM 5z-ai8239.5203K$0.6000$1.92
45Z.ai: GLM 4.5z-ai8219.5131K$0.6000$2.20
46OpenAI: GPT-5.2openai8226400K$1.75$14.00
47OpenAI: o4 Miniopenai8225.6200K$1.10$4.40
48GPT-5.5 (low)OpenAI8243.5$5.00$30.00
49Anthropic: Claude Sonnet 4.5anthropic8229.31M$3.00$15.00
50OpenAI: GPT-5.4 Nanoopenai8238.2400K$0.2000$1.25
#1NewTop Pick93.3
Anthropic: Claude Fable 5anthropic
AI 59.91M ctx$10.00/M in
#2NewIn-House Pick93
Anthropic: Claude Sonnet 5anthropic
AI 53.41M ctx$2.00/M in
#3Top PickIn-House Pick92.3
Anthropic: Claude Opus 4.8anthropic
AI 55.71M ctx$5.00/M in
#489.8
Google: Gemini 3.1 Pro Previewgoogle
AI 46.51M ctx$2.00/M in
#589.4
Anthropic: Claude Opus 4.7anthropic
AI 53.51M ctx$5.00/M in
#688.9
Google: Gemini 3.5 Flashgoogle
AI 50.21M ctx$1.50/M in
#7Top Pick88.3
OpenAI: GPT-5.5openai
AI 54.81.1M ctx$5.00/M in
#887
OpenAI: GPT-5.4openai
AI 51.41.1M ctx$2.50/M in
#9New86.1
Z.ai: GLM 5.2z-ai
AI 51.11M ctx$0.9300/M in
#1085.1
Anthropic: Claude Sonnet 4.6anthropic
AI 47.21M ctx$3.00/M in
#1183.5
Qwen: Qwen3.7 Maxqwen
AI 461M ctx$1.25/M in
#1283
Anthropic: Claude Opus 4.6anthropic
AI 43.71M ctx$5.00/M in
#1382.1
DeepSeek: DeepSeek V4 Prodeepseek
AI 44.31M ctx$0.4350/M in
#1482
OpenAI: GPT-5openai
AI 31.2400K ctx$1.25/M in
#1582
Google: Gemini 3 Flash Previewgoogle
AI 37.81M ctx$0.5000/M in
#1682
Anthropic: Claude Opus 4anthropic
AI 31200K ctx$15.00/M in
#1782
inclusionAI: Ling-2.6-1T (free)inclusionai
AI 33.6262K ctxFree/M in
#1882
MiniMax: MiniMax M2minimax
AI 28.3205K ctx$0.2550/M in
#1982
ERNIE 5.0 Thinking PreviewBaidu
AI 21.9Free/M in
#2082
OpenAI: GPT-5 Miniopenai
AI 33400K ctx$0.2500/M in
#2182
xAI: Grok 4.20x-ai
AI 372M ctx$1.25/M in
#2282
Xiaomi: MiMo-V2-Flashxiaomi
AI 24.7262K ctx$0.1000/M in
#2382
Anthropic: Claude Sonnet 4anthropic
AI 28.91M ctx$3.00/M in
#24NewBest for Agents82
Nex AGI: Nex-N2-Pronex-agi
AI 41262K ctx$0.2500/M in
#2582
Anthropic: Claude Haiku 4.5anthropic
AI 29.6200K ctx$1.00/M in
#2682
xAI: Grok 4.20 Multi-Agent Betax-ai
AI 48.52M ctx$2.00/M in
#2782
Qwen3.6 35B A3B (Non-reasoning)Alibaba
AI 24.2$0.3750/M in
#2882
Qwen: Qwen3.5 397B A17Bqwen
AI 33.7256K ctx$0.3850/M in
#2982
OpenAI: gpt-oss-20bopenai
AI 14.9131K ctx$0.0290/M in
#3082
Z.ai: GLM 5V Turboz-ai
AI 34.5203K ctx$1.20/M in
#3182
Google: Gemini 2.5 Pro Preview 05-06google
AI 29.51M ctx$1.25/M in
#32Best Value82
DeepSeek: DeepSeek V4 Flashdeepseek
AI 40.31M ctx$0.0980/M in
#33New82
Grok Build 0.1 0616xAI
AI 39.8$1.00/M in
#3482
OpenAI: o3 Deep Researchopenai
AI 38.3200K ctx$10.00/M in
#3582
Qwen: Qwen3.6 35B A3Bqwen
AI 31.6262K ctx$0.1400/M in
#3682
MiniMax: MiniMax M2.5minimax
AI 33.7205K ctx$0.1200/M in
#3782
Anthropic: Claude Opus 4.1anthropic
AI 33.7200K ctx$15.00/M in
#3882
Z.ai: GLM 5.1z-ai
AI 40.2203K ctx$0.9750/M in
#3982
OpenAI: o3openai
AI 30.4200K ctx$2.00/M in
#4082
GPT-5.5 (medium)OpenAI
AI 50.4$5.00/M in
#4182
OpenAI: o4 Mini Deep Researchopenai
AI 33200K ctx$2.00/M in
#4282
Z.ai: GLM 5 Turboz-ai
AI 38.1262K ctx$1.20/M in
#4382
Qwen: Qwen3.6 27Bqwen
AI 37.1262K ctx$0.2850/M in
#4482
Z.ai: GLM 5z-ai
AI 39.5203K ctx$0.6000/M in
#4582
Z.ai: GLM 4.5z-ai
AI 19.5131K ctx$0.6000/M in
#4682
OpenAI: GPT-5.2openai
AI 26400K ctx$1.75/M in
#4782
OpenAI: o4 Miniopenai
AI 25.6200K ctx$1.10/M in
#4882
GPT-5.5 (low)OpenAI
AI 43.5$5.00/M in
#4982
Anthropic: Claude Sonnet 4.5anthropic
AI 29.31M ctx$3.00/M in
#5082
OpenAI: GPT-5.4 Nanoopenai
AI 38.2400K ctx$0.2000/M in

How we rank AI models

The Design for Online AI Model Leaderboard scores 590 models on a single 0–100 scale built from four weighted dimensions: intelligence (reasoning and knowledge benchmarks), technical capability (coding and tool use), content quality (writing and instruction-following) and value (capability per dollar).

Underlying data is aggregated from the OpenRouter API for pricing and availability, Artificial Analysis for intelligence, coding and agentic indices, and the Hugging Face Open LLM Leaderboard for open-model benchmarks. We refresh these sources daily and layer our own editorial review on top, so a model that benchmarks well but is impractical to deploy will not automatically top the table.

Models are grouped into tiers (Frontier, Professional, Specialist, Efficient, Emerging and Legacy) to make like-for-like comparison easier, and newly released models are flagged so you can see what has just landed.

Leaderboard FAQ

How often is the leaderboard updated?

Pricing, availability and benchmark data are synced daily from our sources, and editorial scores are reviewed whenever a significant new model is released.

How is the overall score calculated?

Each model is graded 0–10 on intelligence, technical capability, content quality and value; those dimensions are weighted and combined into the 0–100 overall score used to rank the table.

Where does the data come from?

From the OpenRouter API, Artificial Analysis and the Hugging Face Open LLM Leaderboard, combined with hands-on editorial testing by the Design for Online team.