NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

nvidia · Released Oct 10, 2025
44
Our Score

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.

$0.10 / 1M Input Price
$0.40 / 1M Output Price
131,072 tokens Context Window
49B Parameters

Capabilities

Tool Use Function Calling

Architecture

ModalityText → Text
TokenizerLlama3
Parameters49B

Performance Indices

Source: Artificial Analysis

18.5 Intelligence Index
9.4 Coding Index
26.9 Agentic Index
54.7 Math Index

Benchmark Scores

Evaluations

GPQA Diamond 64.3%
Graduate-level scientific reasoning
HLE 6.5%
Humanity's Last Exam
MMLU Pro 78.5%
Multi-task language understanding
LiveCodeBench 27.7%
Live coding evaluation
SciCode 28.2%
Scientific computing
MATH 500 95.9%
Mathematical problem-solving
AIME 58.3%
Competition mathematics
AIME 2025 54.7%
Competition mathematics (2025)
IFBench 38.1%
Instruction following
LCR 17%
Long-context reasoning
τ²-Bench 26.9%
Conversational agent benchmark

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID nvidia/llama-3.3-nemotron-super-49b-v1.5
Providernvidia
Release Date October 10, 2025
Context Length131,072 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.10 $0.000100
Output $0.40 $0.000400

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

237ms
Best Latency (TTFT)
86 tok/s
Best Throughput
0/1
Active Endpoints
Available via: DeepInfra