Inception: Mercury

Inception: Mercury

inception · Released Jun 26, 2025
33
Our Score

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post]
(https://www.inceptionlabs.ai/blog/introducing-mercury) here.

$0.25 / 1M Input Price
$0.75 / 1M Output Price
128,000 tokens Context Window
32,000 tokens Max Output

Capabilities

Tool Use Function Calling

Architecture

ModalityText → Text
TokenizerOther

Model Information

OpenRouter ID inception/mercury
Providerinception
Release Date June 26, 2025
Context Length128,000 tokens
Max Completion32,000 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.25 $0.000250
Output $0.75 $0.000750

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

886ms
Best Latency (TTFT)
35 tok/s
Best Throughput
0/1
Active Endpoints
Available via: Inception

Leaderboard Categories