DeepSeek: R1 Distill Llama 70B

DeepSeek: R1 Distill Llama 70B

deepseek · Released Jan 23, 2025
38
Our Score

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0
- MATH-500 pass@1: 94.5
- CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

$0.70 / 1M Input Price
$0.80 / 1M Output Price
131,072 tokens Context Window
16,384 tokens Max Output
70B Parameters

Architecture

ModalityText → Text
TokenizerLlama3
Instruct Typedeepseek-r1
Parameters70B

Performance Indices

Source: Artificial Analysis

16 Intelligence Index
11.4 Coding Index
11.7 Agentic Index
53.7 Math Index

Benchmark Scores

Evaluations

GPQA Diamond 40.2%
Graduate-level scientific reasoning
HLE 6.1%
Humanity's Last Exam
MMLU Pro 79.5%
Multi-task language understanding
LiveCodeBench 26.6%
Live coding evaluation
SciCode 31.2%
Scientific computing
MATH 500 93.5%
Mathematical problem-solving
AIME 67%
Competition mathematics
AIME 2025 53.7%
Competition mathematics (2025)
IFBench 27.6%
Instruction following
LCR 11%
Long-context reasoning
TerminalBench Hard 1.5%
Agentic terminal tasks
τ²-Bench 21.9%
Conversational agent benchmark

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID deepseek/deepseek-r1-distill-llama-70b
Providerdeepseek
Model FamilyDeepSeek
Release Date January 23, 2025
Context Length131,072 tokens
Max Completion16,384 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.70 $0.000700
Output $0.80 $0.000800

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

96.1%
Avg Uptime
117ms
Best Latency (TTFT)
198 tok/s
Best Throughput
3/3
Active Endpoints
Available via: DeepInfra, SambaNova, Novita