AI Agents

Models optimised for autonomous agent workflows.

Updated July 2, 2026

Models optimised for autonomous agent workflows.

# Model Score AI Index Context Input / 1M Output / 1M Caps
1Anthropic: Claude Fable 5anthropic New Top Pick93.359.91M$10.00$50.00
2Anthropic: Claude Sonnet 5anthropic New In-House Pick9353.41M$2.00$10.00
3Anthropic: Claude Opus 4.8anthropic Top Pick In-House Pick92.355.71M$5.00$25.00
4Google: Gemini 3.1 Pro Previewgoogle89.846.51M$2.00$12.00
5Anthropic: Claude Opus 4.7anthropic89.453.51M$5.00$25.00
6Google: Gemini 3.5 Flashgoogle88.950.21M$1.50$9.00
7OpenAI: GPT-5.5openai Top Pick88.354.81.1M$5.00$30.00
8OpenAI: GPT-5.4openai8751.41.1M$2.50$15.00
9Z.ai: GLM 5.2z-ai New86.151.11M$0.9300$3.00
10Anthropic: Claude Sonnet 4.6anthropic85.147.21M$3.00$15.00
11Qwen: Qwen3.7 Maxqwen83.5461M$1.25$3.75
12Anthropic: Claude Opus 4.6anthropic8343.71M$5.00$25.00
13DeepSeek: DeepSeek V4 Prodeepseek82.144.31M$0.4350$0.8700
14DeepSeek: DeepSeek V3.1 Terminusdeepseek8226.3164K$0.2700$0.9500
15MiniMax: MiniMax M2.7minimax8238.1205K$0.1800$0.7200
16Grok 4.20 0309 (Reasoning)xAI8236.5$2.00$6.00
17Z.ai: GLM 4.7 Flashz-ai8215.5203K$0.0600$0.4000
18MoonshotAI: Kimi K2 0711moonshotai8219.4131K$0.5700$2.30
19MoonshotAI: Kimi K2.6moonshotai Updated8242.8262K$0.6600$3.41
20Qwen: Qwen3.7 Plusqwen New82391M$0.3200$1.28
21OpenAI: GPT-5.1-Codexopenai8234.7400K$1.25$10.00
22Anthropic: Claude 3.7 Sonnetanthropic8223.5200K$3.00$15.00
23Muse SparkMeta8243.1FreeFree
24xAI: Grok 4 Fastx-ai8227.42M$0.2000$0.5000
25Xiaomi: MiMo-V2-Omnixiaomi8235262K$0.4000$2.00
26Qwen3 Coder 480B A35B InstructAlibaba8218$1.50$7.50
27OpenAI: GPT-5.2-Codexopenai8240.1400K$1.75$14.00
28xAI: Grok 4x-ai8233.3256K$3.00$15.00
29Xiaomi: MiMo-V2.5-Proxiaomi8242.21M$0.4350$0.8700
30NVIDIA: Nemotron 3 Ultranvidia New8237.81M$0.5000$2.20
31OpenAI: GPT-5.1-Codex-Miniopenai8230.6400K$0.2500$2.00
32Anthropic: Claude 3.7 Sonnet (thinking)anthropic8227.1200K$3.00$15.00
33Nova 2.0 Lite (high)Amazon8218.2$0.3000$2.50
34MoonshotAI: Kimi K2 0905moonshotai8223.5262K$0.6000$2.50
35Xiaomi: MiMo-V2-Proxiaomi8240.31M$1.00$3.00
36xAI: Grok 4.3x-ai8237.61M$1.25$2.50
37MiniMax: MiniMax M2.1minimax8231.4205K$0.3000$1.20
38Google: Gemini 2.5 Progoogle8225.81M$1.25$10.00
39Xiaomi: MiMo-V2.5xiaomi8240.11M$0.1050$0.2800
40Kwaipilot: KAT-Coder-Pro V1kwaipilot8234.6256K$0.2070$0.8280
41OpenAI: o1openai8223.4200K$15.00$60.00
42Nova 2.0 Lite (medium)Amazon8219$0.3000$2.50
43OpenAI: GPT-5.3-Codexopenai8244.3400K$1.75$14.00
44xAI: Grok Code Fast 1x-ai8221.6256K$0.2000$1.50
45Kwaipilot: KAT-Coder-Pro V2kwaipilot8235.4256K$0.3000$1.20
46Mistral: Mistral Medium 3.5mistralai8229.9262K$1.50$7.50
47Z.ai: GLM 4.7z-ai8233.7203K$0.4000$1.75
48Google: Gemini 2.5 Pro Preview 06-05google82231M$1.25$10.00
49Tencent: Hy3 preview (free)tencent8241.9262KFreeFree
50MoonshotAI: Kimi K2.7 Codemoonshotai New8241.9262K$0.7400$3.50
#1NewTop Pick93.3
Anthropic: Claude Fable 5anthropic
AI 59.91M ctx$10.00/M in
#2NewIn-House Pick93
Anthropic: Claude Sonnet 5anthropic
AI 53.41M ctx$2.00/M in
#3Top PickIn-House Pick92.3
Anthropic: Claude Opus 4.8anthropic
AI 55.71M ctx$5.00/M in
#489.8
Google: Gemini 3.1 Pro Previewgoogle
AI 46.51M ctx$2.00/M in
#589.4
Anthropic: Claude Opus 4.7anthropic
AI 53.51M ctx$5.00/M in
#688.9
Google: Gemini 3.5 Flashgoogle
AI 50.21M ctx$1.50/M in
#7Top Pick88.3
OpenAI: GPT-5.5openai
AI 54.81.1M ctx$5.00/M in
#887
OpenAI: GPT-5.4openai
AI 51.41.1M ctx$2.50/M in
#9New86.1
Z.ai: GLM 5.2z-ai
AI 51.11M ctx$0.9300/M in
#1085.1
Anthropic: Claude Sonnet 4.6anthropic
AI 47.21M ctx$3.00/M in
#1183.5
Qwen: Qwen3.7 Maxqwen
AI 461M ctx$1.25/M in
#1283
Anthropic: Claude Opus 4.6anthropic
AI 43.71M ctx$5.00/M in
#1382.1
DeepSeek: DeepSeek V4 Prodeepseek
AI 44.31M ctx$0.4350/M in
#1482
DeepSeek: DeepSeek V3.1 Terminusdeepseek
AI 26.3164K ctx$0.2700/M in
#1582
MiniMax: MiniMax M2.7minimax
AI 38.1205K ctx$0.1800/M in
#1682
Grok 4.20 0309 (Reasoning)xAI
AI 36.5$2.00/M in
#1782
Z.ai: GLM 4.7 Flashz-ai
AI 15.5203K ctx$0.0600/M in
#1882
MoonshotAI: Kimi K2 0711moonshotai
AI 19.4131K ctx$0.5700/M in
#1982
MoonshotAI: Kimi K2.6moonshotai
AI 42.8262K ctx$0.6600/M in
#20New82
Qwen: Qwen3.7 Plusqwen
AI 391M ctx$0.3200/M in
#2182
OpenAI: GPT-5.1-Codexopenai
AI 34.7400K ctx$1.25/M in
#2282
Anthropic: Claude 3.7 Sonnetanthropic
AI 23.5200K ctx$3.00/M in
#2382
Muse SparkMeta
AI 43.1Free/M in
#2482
xAI: Grok 4 Fastx-ai
AI 27.42M ctx$0.2000/M in
#2582
Xiaomi: MiMo-V2-Omnixiaomi
AI 35262K ctx$0.4000/M in
#2682
Qwen3 Coder 480B A35B InstructAlibaba
AI 18$1.50/M in
#2782
OpenAI: GPT-5.2-Codexopenai
AI 40.1400K ctx$1.75/M in
#2882
xAI: Grok 4x-ai
AI 33.3256K ctx$3.00/M in
#2982
Xiaomi: MiMo-V2.5-Proxiaomi
AI 42.21M ctx$0.4350/M in
#30New82
NVIDIA: Nemotron 3 Ultranvidia
AI 37.81M ctx$0.5000/M in
#3182
OpenAI: GPT-5.1-Codex-Miniopenai
AI 30.6400K ctx$0.2500/M in
#3282
Anthropic: Claude 3.7 Sonnet (thinking)anthropic
AI 27.1200K ctx$3.00/M in
#3382
Nova 2.0 Lite (high)Amazon
AI 18.2$0.3000/M in
#3482
MoonshotAI: Kimi K2 0905moonshotai
AI 23.5262K ctx$0.6000/M in
#3582
Xiaomi: MiMo-V2-Proxiaomi
AI 40.31M ctx$1.00/M in
#3682
xAI: Grok 4.3x-ai
AI 37.61M ctx$1.25/M in
#3782
MiniMax: MiniMax M2.1minimax
AI 31.4205K ctx$0.3000/M in
#3882
Google: Gemini 2.5 Progoogle
AI 25.81M ctx$1.25/M in
#3982
Xiaomi: MiMo-V2.5xiaomi
AI 40.11M ctx$0.1050/M in
#4082
Kwaipilot: KAT-Coder-Pro V1kwaipilot
AI 34.6256K ctx$0.2070/M in
#4182
OpenAI: o1openai
AI 23.4200K ctx$15.00/M in
#4282
Nova 2.0 Lite (medium)Amazon
AI 19$0.3000/M in
#4382
OpenAI: GPT-5.3-Codexopenai
AI 44.3400K ctx$1.75/M in
#4482
xAI: Grok Code Fast 1x-ai
AI 21.6256K ctx$0.2000/M in
#4582
Kwaipilot: KAT-Coder-Pro V2kwaipilot
AI 35.4256K ctx$0.3000/M in
#4682
Mistral: Mistral Medium 3.5mistralai
AI 29.9262K ctx$1.50/M in
#4782
Z.ai: GLM 4.7z-ai
AI 33.7203K ctx$0.4000/M in
#4882
Google: Gemini 2.5 Pro Preview 06-05google
AI 231M ctx$1.25/M in
#4982
Tencent: Hy3 preview (free)tencent
AI 41.9262K ctxFree/M in
#50New82
MoonshotAI: Kimi K2.7 Codemoonshotai
AI 41.9262K ctx$0.7400/M in

How we rank AI models

The Design for Online AI Model Leaderboard scores 592 models on a single 0–100 scale built from four weighted dimensions: intelligence (reasoning and knowledge benchmarks), technical capability (coding and tool use), content quality (writing and instruction-following) and value (capability per dollar).

Underlying data is aggregated from the OpenRouter API for pricing and availability, Artificial Analysis for intelligence, coding and agentic indices, and the Hugging Face Open LLM Leaderboard for open-model benchmarks. We refresh these sources daily and layer our own editorial review on top, so a model that benchmarks well but is impractical to deploy will not automatically top the table.

Models are grouped into tiers (Frontier, Professional, Specialist, Efficient, Emerging and Legacy) to make like-for-like comparison easier, and newly released models are flagged so you can see what has just landed.

Leaderboard FAQ

How often is the leaderboard updated?

Pricing, availability and benchmark data are synced daily from our sources, and editorial scores are reviewed whenever a significant new model is released.

How is the overall score calculated?

Each model is graded 0–10 on intelligence, technical capability, content quality and value; those dimensions are weighted and combined into the 0–100 overall score used to rank the table.

Where does the data come from?

From the OpenRouter API, Artificial Analysis and the Hugging Face Open LLM Leaderboard, combined with hands-on editorial testing by the Design for Online team.