Inception: Mercury 2

Inception: Mercury 2

inception · Released Mar 4, 2026 Specialist
62.5
Our Score

Performance Profile

Intelligence5.8Technical5.8Value7.8Content5.5
Intelligence 5.8/10
Technical 5.8/10
Content 5.5/10
Value 7.8/10

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM).
Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost. Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the blog post.

$0.25 / 1M
Input Price
$0.75 / 1M
Output Price
128,000 tokens
Context Window
50,000 tokens
Max Output

Capabilities

Tool Use Function Calling

Architecture

ModalityText → Text
TokenizerOther

Performance Indices

Source: Artificial Analysis

32.8 Intelligence Index
30.6 Coding Index
48.7 Agentic Index

Benchmark Scores

Intelligence

GPQA Diamond 77% Graduate-level scientific reasoning
HLE 15.5% Humanity's Last Exam
SciCode 38.7% Scientific computing

Technical

TerminalBench Hard 26.5% Agentic terminal tasks
τ²-Bench 70.8% Conversational agent benchmark

Content

IFBench 69.8% Instruction following
LCR 36.3% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID inception/mercury-2
Providerinception
Release Date March 4, 2026
Context Length128,000 tokens
Max Completion50,000 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.25 $0.000250
Output $0.75 $0.000750

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
321ms
Best Latency (TTFT)
141 tok/s
Best Throughput
1/1
Active Endpoints
Available via: Inception

Leaderboard Categories