NVIDIA: Llama 3.1 Nemotron Ultra 253B v1
Analysis Summary
NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 sits in the Efficient tier on our leaderboard, ranked #206 of 561 published models on overall intelligence. At $0.600 input and $1.80 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports reasoning.
Editorial notes
NVIDIA Llama 3.1 Nemotron Ultra 253B v1 shows strong maths scores but very low intelligence and agentic indices, with near-zero terminal and long-context reliability, limiting its practical business utility.
Assessed May 14, 2026
Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?
Performance Profile
Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural..
Performance Indices
Source: Artificial Analysis
Benchmark Scores
Intelligence
Technical
Content
Benchmark data from Artificial Analysis and Hugging Face
How does NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 stack up?
Compare side-by-side with other efficient models.
Model Information
| OpenRouter ID |
nvidia/llama-3.1-nemotron-ultra-253b-v1
|
| Provider | nvidia |
| Release Date | April 8, 2025 |
| Context Length | 131,072 tokens |
| Status | Active |
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.60 | $0.000600 |
| Output | $1.80 | $0.001800 |
External Resources
Explore Related Models
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: May 23, 2026 8:38 pm