Home > AI Models > NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

Name: NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 Review
Item: NVIDIA: Llama 3.1 Nemotron Ultra 253B v1
Author: Design for Online Editorial

LEADERBOARD LATEST

NEWKimi K3in at #9 NEWKAT-Coder-Air V2.5in at #560 NEWKAT-Coder-Pro V2.5in at #568 NEWMuse Spark 1.1in at #392 NEWUncensoredin at #487 NEWGPT-5.6 Terrain at #11 NEWGPT-5.6 Sol Proin at #416 NEWGPT-5.6 Solin at #2

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

nvidia · Released Apr 8, 2025

Intelligence #9 / 612

82.0 our score

Speed #220 / 287

53.6 tok/s

Input Price #406 / 612

$0.600 per 1M tokens

Output Price #390 / 612

$1.80 per 1M tokens

Context #262 / 612

131,072 tokens

Llama 3.1 Nemotron Ultra 253B v1 is NVIDIA's large reasoning-focused model built on the Llama 3.1 architecture, with a 131K context window and strong performance on math (index 63.7) and coding (livecodebench 0.641, AIME 0.637). The MMLU Pro score of 0.825 confirms broad academic knowledge depth.

However, the agentic index of just 6.8 and terminalbench hard score of 0.023 are very low, indicating poor reliability for multi-step tool use or autonomous workflows. The long-context retrieval score of 0.073 is also weak, limiting its usefulness for large document analysis despite the 131K window. Instruction following (ifbench 0.382) is below average.

At $0.60 input and $1.80 output, it offers reasonable value for pure math and coding tasks. Teams needing a strong mathematical reasoner or code generator for batch, non-agentic workloads will find it capable, but it should not be used in agentic or tool-orchestrated pipelines.

Assessed July 10, 2026

Editorial notes

NVIDIA Nemotron Ultra 253B has strong math and coding benchmarks including a livecodebench of 0.641, but very low agentic and long-context scores limit its business versatility.

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

DFO Verdict

NVIDIA Nemotron Ultra 253B has strong math and coding benchmarks including a livecodebench of 0.641, but very low agentic and long-context scores limit its business versatility.

#9 of 612 overall

Benchmark scores

GPQA Diamond 72.8%

HLE 8.1%

MMLU Pro 82.5%

MATH 500 95.2%

AIME 74.7%

AIME 2025 63.7%

SciCode 34.7%

LiveCodeBench 64.1%

TerminalBench Hard 2.3%

τ²-Bench 11.4%

IFBench 38.2%

LCR 7.3%

Magenta = intelligence · Ink = technical/agentic · Cyan = content & long-context · Grey = community benchmarks. Data: Artificial Analysis, Hugging Face.

9.1 Intelligence Index·6.8 Agentic Index·63.7 Math Index

How NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 compares

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 ranks #251 of 393 AI models we track for overall intelligence, #273 of 300 for agentic tasks. Its 131K-token context window is larger than 57% of the models we list. At $0.60 per million input tokens it is cheaper than 34% of comparable models.

Position in the field

Intelligence: smarter than 99% of models #9

Speed: faster than 23% of models #220

Price: cheaper than 34% of models #406

Context: larger than 57% of models #262

worst in fieldmedianbest in field

Price vs frontier peers · $ per 1M tokens

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 $0.60 in $1.80 out

Anthropic: Claude Fable 5 $10.00 in $50.00 out

Anthropic: Claude Opus 4.8 $5.00 in $25.00 out

Google: Gemini 3.1 Pro Preview $2.00 in $12.00 out

Dark bar = input · light bar = output, scaled to the priciest peer.

Context window vs peers · tokens

Google: Gemini 3.1 Pro Preview 1M

Anthropic: Claude Fable 5 1M

Anthropic: Claude Opus 4.8 1M

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 131K

1M tokens ≈ 8 full-length novels or ~2,500 pages of business documents in a single request.

Performance profile

Strongest on value. The pulled-in technical corner is the trade-off, and if the shape matters more than the price, this is your model.

Compare shapes side-by-side →

Pricing

Token Type	Cost per 1M tokens	Cost per 1K tokens
Input	$0.60	$0.000600
Output	$1.80	$0.001800

What would NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 cost your business?

Pick the job that looks most like yours, then fine-tune with the sliders. Estimates update live.

A website chatbot handling around 100 customer conversations a day, a few short messages each.

Requests per month 3,000

One request is one message, email, draft or automation call.

Size of each request 1,200 tokens

$0/mo NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

$0/mo Anthropic: Claude Fable 5

$0/mo Z.ai: GLM 5.2 · best value

Full calculator with 612 models → Price Calculator

DFO AI AUTOMATION

These numbers get smaller with the right architecture.

We route routine calls to cheap models and save NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 for the hard ones. Most clients cut their estimate by 60-80%.

Talk to our team

About NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural..

Embed this ranking

Writing about this model? Add the badge to your site. It always shows the current rank and score, and links back to this page.

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 rank badge, Dark

<a href="https://designforonline.com/ai-models/nvidia-llama-3-1-nemotron-ultra-253b-v1/"><img src="https://designforonline.com/?aiml_badge=nvidia-llama-3-1-nemotron-ultra-253b-v1&theme=dark" alt="NVIDIA: Llama 3.1 Nemotron Ultra 253B v1, ranked #9 on the Design for Online AI Leaderboard" width="400" height="76"></a>

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 rank badge, Light

<a href="https://designforonline.com/ai-models/nvidia-llama-3-1-nemotron-ultra-253b-v1/"><img src="https://designforonline.com/?aiml_badge=nvidia-llama-3-1-nemotron-ultra-253b-v1&theme=light" alt="NVIDIA: Llama 3.1 Nemotron Ultra 253B v1, ranked #9 on the Design for Online AI Leaderboard" width="400" height="76"></a>

Frequently asked questions about NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

How much does NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 cost?

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 costs $0.60 per million input tokens and $1.80 per million output tokens.

What is the context window of NVIDIA: Llama 3.1 Nemotron Ultra 253B v1?

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 has a context window of 131,072 tokens (131K).

Who created NVIDIA: Llama 3.1 Nemotron Ultra 253B v1?

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 is developed by NVIDIA and was released on April 8, 2025.

Performance profile

Intelligence 2.6

Technical 1.3

Content 2.6

Value 7.3

Reasoning: Yes
Input
Output
Context: 131,072 tokens
Tokenizer: Llama3
Released: Apr 8, 2025

Modality data from OpenRouter; may understate provider-native audio/video/image output.

Model information

Provider nvidia

OpenRouter ID nvidia/llama-3.1-nemotron-ultra-253b-v1

Status Active

Ranked in

Coding

External resources View on OpenRouter API access, playground & provider details API Quickstart Sample code and integration guide

Data sourced from the OpenRouter API, Artificial Analysis, the Hugging Face Open LLM Leaderboard and our own internal testing. Scores are editorially curated by our team.

Last updated: July 19, 2026 8:38 pm

Issues with our rankings? Contact us

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

DFO Verdict

Benchmark scores

How NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 compares

Pricing

What would NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 cost your business?

About NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

Explore Related Models

Embed this ranking

Frequently asked questions about NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

How much does NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 cost?

What is the context window of NVIDIA: Llama 3.1 Nemotron Ultra 253B v1?

Who created NVIDIA: Llama 3.1 Nemotron Ultra 253B v1?