NVIDIA: Nemotron 3 Super

NVIDIA: Nemotron 3 Super

nvidia · Released Mar 11, 2026 Specialist New
66.7
Our Score

Performance Profile

Intelligence6.4Technical5.9Value8Content6.5
Intelligence 6.4/10
Technical 5.9/10
Content 6.5/10
Value 8/10

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

$0.10 / 1M
Input Price
$0.50 / 1M
Output Price
262,144 tokens
Context Window
120B Parameters

Capabilities

Tool Use Function Calling

Architecture

ModalityText → Text
TokenizerOther
Parameters120B

Performance Indices

Source: Artificial Analysis

36 Intelligence Index
31.2 Coding Index
48.3 Agentic Index

Benchmark Scores

Intelligence

GPQA Diamond 80% Graduate-level scientific reasoning
HLE 19.2% Humanity's Last Exam
SciCode 36% Scientific computing

Technical

TerminalBench Hard 28.8% Agentic terminal tasks
τ²-Bench 67.8% Conversational agent benchmark

Content

IFBench 71.5% Instruction following
LCR 60% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID nvidia/nemotron-3-super-120b-a12b
Providernvidia
Release Date March 11, 2026
Context Length262,144 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.10 $0.000100
Output $0.50 $0.000500

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
1,112ms
Best Latency (TTFT)
219 tok/s
Best Throughput
2/2
Active Endpoints
Available via: DeepInfra, Nebius

Leaderboard Categories