DeepSeek: DeepSeek V3.1

DeepSeek: DeepSeek V3.1

deepseek · Released Aug 21, 2025
58
Our Score

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

$0.15 / 1M Input Price
$0.75 / 1M Output Price
32,768 tokens Context Window
7,168 tokens Max Output

Capabilities

Tool Use Function Calling

Architecture

ModalityText → Text
TokenizerDeepSeek
Instruct Typedeepseek-v3.1

Performance Indices

Source: Artificial Analysis

27.7 Intelligence Index
29.7 Coding Index
31.2 Agentic Index
89.7 Math Index

Benchmark Scores

Evaluations

GPQA Diamond 77.9%
Graduate-level scientific reasoning
HLE 13%
Humanity's Last Exam
MMLU Pro 85.1%
Multi-task language understanding
LiveCodeBench 78.4%
Live coding evaluation
SciCode 39.1%
Scientific computing
AIME 2025 89.7%
Competition mathematics (2025)
IFBench 41.5%
Instruction following
LCR 53.3%
Long-context reasoning
TerminalBench Hard 25%
Agentic terminal tasks
τ²-Bench 37.4%
Conversational agent benchmark

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID deepseek/deepseek-chat-v3.1
Providerdeepseek
Model FamilyDeepSeek
Release Date August 21, 2025
Context Length32,768 tokens
Max Completion7,168 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.15 $0.000150
Output $0.75 $0.000750

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

96.8%
Avg Uptime
433ms
Best Latency (TTFT)
60 tok/s
Best Throughput
11/11
Active Endpoints
Available via: SambaNova, Chutes, DeepInfra, Novita, SiliconFlow, AtlasCloud, WandB, Fireworks +2 more

Leaderboard Categories