NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

nvidia · Released Apr 8, 2025 Efficient
41.4
Our Score

Performance Profile

Intelligence3.6Technical2.1Value7.3Content4
Intelligence 3.6/10
Technical 2.1/10
Content 4/10
Value 7.3/10

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include detailed thinking on in the system prompt to enable reasoning. Please see Usage Recommendations for more.

$0.60 / 1M
Input Price
$1.80 / 1M
Output Price
131,072 tokens
Context Window
253B Parameters

Architecture

ModalityText → Text
TokenizerLlama3
Parameters253B

Performance Indices

Source: Artificial Analysis

15 Intelligence Index
13.1 Coding Index
6.9 Agentic Index
63.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 72.8% Graduate-level scientific reasoning
HLE 8.1% Humanity's Last Exam
MMLU Pro 82.5% Multi-task language understanding
MATH 500 95.2% Mathematical problem-solving
AIME 74.7% Competition mathematics
AIME 2025 63.7% Competition mathematics (2025)
SciCode 34.7% Scientific computing

Technical

LiveCodeBench 64.1% Live coding evaluation
TerminalBench Hard 2.3% Agentic terminal tasks
τ²-Bench 11.4% Conversational agent benchmark

Content

IFBench 38.2% Instruction following
LCR 7.3% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID nvidia/llama-3.1-nemotron-ultra-253b-v1
Providernvidia
Release Date April 8, 2025
Context Length131,072 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.60 $0.000600
Output $1.80 $0.001800

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
387ms
Best Latency (TTFT)
37 tok/s
Best Throughput
1/1
Active Endpoints
Available via: Nebius