Best AI Models for Tool Use.

Models with strong tool-use and function-calling support.

316 Models tracked

46 Providers

Daily Data refresh

10 Editorial picks

Overall AI Agents Coding Content Writing General SEO Tool Use

#	Model		DFO Score ▾	Tok/s	In $/1M	Out $/1M	Ctx
1	Claude Fable 5Anthropic	TOP PICK	91.5	56.5	$10.00	$50.00	1M
2	Claude Opus 4.8Anthropic	TOP PICKIN-HOUSE PICK	89.3	52.4	$5.00	$25.00	1M
3	Gemini 3.1 Pro PreviewGoogle	FRONTIER	85.4	126	$2.00	$12.00	1M
4	GLM 5.2Z.ai	BEST VALUE	84.7	167	$0.2338	$0.7348	1M
5	Claude Sonnet 5Anthropic	IN-HOUSE PICKNEW	84.6	82.8	$2.00	$10.00	1M
6	Grok 4.5xAI	BEST FOR CODINGNEW	84.3	73.8	$2.00	$6.00	500K
7	GPT-5.5OpenAI	BEST FOR CODING	83.3	68.5	$5.00	$30.00	1.1M
8	MiniMax M2.1MiniMax		82.0	83.9	$0.3000	$1.20	205K
9	Nemotron Nano 12B 2 VLNVIDIA		82.0	207	$0.2000	$0.6000	131K
10	GLM 4.5VZ.ai		82.0	76.0	$0.6000	$1.80	66K
11	Gemini 2.5 FlashGoogle		82.0	228	$0.3000	$2.50	1M
12	QwQ 32BQwen		82.0	31.9	$0.1500	$0.5800	131K
13	GPT-4o (2024-08-06)OpenAI		82.0	97.6	$2.50	$10.00	128K
14	GPT-5.4 MiniOpenAI		82.0	176	$0.7500	$4.50	400K
15	GPT-5.5 (high)OpenAI	BEST FOR AGENTS	82.0	64.8	$5.00	$30.00	–
16	GPT-5.6 LunaOpenAI	NEW	82.0	178	$1.00	$6.00	1.1M
17	GLM 4.6VZ.ai		82.0	48.0	$0.3000	$0.9000	131K
18	GLM 4.6Z.ai		82.0	55.5	$0.5000	$2.00	203K
19	Qwen3 Coder 30B A3B InstructQwen		82.0	102	$0.0700	$0.2700	160K
20	o3 MiniOpenAI		82.0	231	$1.10	$4.40	200K

Showing 1–20 of 316 · Data from OpenRouter, Artificial Analysis, Hugging Face & our own testing. Scores editorially curated.

We deploy these models for businesses every week. Get a recommendation for your workload.

Get Started

Models with strong tool-use and function-calling support.

Leaderboards by use case

The overall table, re-ranked for the job you're hiring a model for.

AI Agents 1. Claude Fable 5 91.5 2. Claude Opus 4.8 89.3 3. Gemini 3.1 Pro Preview 85.4 View 118 models → Coding 1. Claude Fable 5 91.5 2. Claude Opus 4.8 89.3 3. Gemini 3.1 Pro Preview 85.4 View 167 models → Content Writing 1. Claude Fable 5 91.5 2. Claude Opus 4.8 89.3 3. Gemini 3.1 Pro Preview 85.4 View 183 models → General 1. Claude Fable 5 91.5 2. Claude Opus 4.8 89.3 3. Gemini 3.1 Pro Preview 85.4 View 69 models → SEO 1. GPT-5 Nano 82.0 2. Gemma 4 26B A4B 82.0 3. Gemini 1.5 Flash (Sep ’24) 82.0 View 36 models →

How we rank AI models

The Design for Online AI Model Leaderboard scores 612 models on a single 0–100 scale built from four weighted dimensions: intelligence (reasoning and knowledge benchmarks), technical capability (coding and tool use), content quality (writing and instruction-following) and value (capability per dollar).

Underlying data is aggregated from the OpenRouter API for pricing and availability, Artificial Analysis for intelligence, coding and agentic indices, and the Hugging Face Open LLM Leaderboard for open-model benchmarks. The fourth source is our own: we deploy these models in client agents, chatbots and automations every week, and that internal testing feeds the editorial layer, so a model that benchmarks well but is impractical to deploy will not automatically top the table.

Models are grouped into tiers (Frontier, Professional, Specialist, Efficient, Emerging and Legacy) to make like-for-like comparison easier, and newly released models are flagged so you can see what has just landed.

Leaderboard FAQ

How often is the leaderboard updated?

Pricing, availability and benchmark data are synced daily from our sources, and editorial scores are reviewed whenever a significant new model is released.

How is the overall score calculated?

Each model is graded 0–10 on intelligence, technical capability, content quality and value; those dimensions are weighted and combined into the 0–100 overall score used to rank the table.

Where does the data come from?

From four sources: the OpenRouter API, Artificial Analysis, the Hugging Face Open LLM Leaderboard, and internal testing from real deployments by the Design for Online team.