NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

nvidia · Released Apr 8, 2025 Efficient
Intelligence #204 / 556
40.6 Our Score
Speed #225 / 257
41.5 tokens / sec
Input #381 / 557
$0.600 per 1M tokens
Output #363 / 557
$1.80 per 1M tokens
Context #220 / 557
131,072 tokens

Analysis Summary

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 sits in the Efficient tier on our leaderboard, ranked #204 of 556 published models on overall intelligence. At $0.600 input and $1.80 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports reasoning.

Editorial notes

NVIDIA Llama 3.1 Nemotron Ultra 253B v1 shows strong maths scores but very low intelligence and agentic indices, with near-zero terminal and long-context reliability, limiting its practical business utility.

Assessed May 14, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence3.6Technical2.1Value7.3Content3.5
Intelligence 3.6/10
Technical 2.1/10
Content 3.5/10
Value 7.3/10

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural..

253B Parameters

Performance Indices

Source: Artificial Analysis

15 Intelligence Index
13.1 Coding Index
6.8 Agentic Index
63.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 72.8% Graduate-level scientific reasoning
HLE 8.1% Humanity's Last Exam
MMLU Pro 82.5% Multi-task language understanding
MATH 500 95.2% Mathematical problem-solving
AIME 74.7% Competition mathematics
AIME 2025 63.7% Competition mathematics (2025)
SciCode 34.7% Scientific computing

Technical

LiveCodeBench 64.1% Live coding evaluation
TerminalBench Hard 2.3% Agentic terminal tasks
τ²-Bench 11.4% Conversational agent benchmark

Content

IFBench 38.2% Instruction following
LCR 7.3% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 stack up?

Compare side-by-side with other efficient models.

Compare Models

Model Information

OpenRouter ID nvidia/llama-3.1-nemotron-ultra-253b-v1
Providernvidia
Release Date April 8, 2025
Context Length131,072 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.60 $0.000600
Output $1.80 $0.001800