Anthropic: Claude 3.7 Sonnet (thinking)

Anthropic: Claude 3.7 Sonnet (thinking)

anthropic · Released Feb 24, 2025 Specialist
Intelligence #58 / 523
69.6 Our Score
AA Index #65 / 351
34.7 Artificial Analysis
Input #473 / 523
$3.00 per 1M tokens
Output #481 / 523
$15.00 per 1M tokens
Context #144 / 523
200,000 tokens

Analysis Summary

Anthropic: Claude 3.7 Sonnet (thinking) sits in the Specialist tier on our leaderboard, ranked #58 of 523 published models on overall intelligence. At $3.00 input and $15.00 output per 1M tokens, it is among the most expensive on the market. It offers a generous context window for extended reasoning and code review and supports tool use, function calling, vision, and reasoning.

Editorial notes

Claude 3.7 Sonnet (thinking) is the extended reasoning variant of Anthropic's Sonnet 3.7, delivering meaningfully stronger performance across maths, coding, and long-context tasks. With vision support, tool use, and a 200K context window, it's a strong choice for complex business workflows requiring deeper analytical capability.

Assessed April 23, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence6.2Technical5.1Value6Content7
Intelligence 6.2/10
Technical 5.1/10
Content 7/10
Value 6/10

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and..

Capabilities

Tool Use Function Calling Vision

Performance Indices

Source: Artificial Analysis

34.7 Intelligence Index
27.6 Coding Index
38 Agentic Index
56.3 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 77.2% Graduate-level scientific reasoning
HLE 10.3% Humanity's Last Exam
MMLU Pro 83.7% Multi-task language understanding
MATH 500 94.7% Mathematical problem-solving
AIME 48.7% Competition mathematics
AIME 2025 56.3% Competition mathematics (2025)
SciCode 40.3% Scientific computing

Technical

LiveCodeBench 47.3% Live coding evaluation
TerminalBench Hard 21.2% Agentic terminal tasks
τ²-Bench 54.7% Conversational agent benchmark

Content

IFBench 48.3% Instruction following
LCR 60.7% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does Anthropic: Claude 3.7 Sonnet (thinking) stack up?

Compare side-by-side with other specialist models.

Compare Models

Model Information

OpenRouter ID anthropic/claude-3.7-sonnet:thinking
Provideranthropic
Model FamilyClaude 3
Release Date February 24, 2025
Context Length200,000 tokens
Max Completion64,000 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $3.00 $0.003000
Output $15.00 $0.015000

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
1,003ms
Best Latency (TTFT)
49 tok/s
Best Throughput
1/1
Active Endpoints
Available via: Google

Leaderboard Categories