NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

nvidia · Released Oct 10, 2025 Efficient
Intelligence #166 / 525
44.8 Our Score
Speed #164 / 244
67.6 tokens / sec
Input #175 / 525
$0.100 per 1M tokens
Output #204 / 525
$0.400 per 1M tokens
Context #185 / 525
131,072 tokens

Analysis Summary

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 sits in the Efficient tier on our leaderboard, ranked #166 of 525 published models on overall intelligence. At $0.100 input and $0.400 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports tool use, function calling, and reasoning.

Editorial notes

NVIDIA's Llama 3.3 Nemotron Super 49B V1.5 is a cost-effective open-weight model with tool use support, but its intelligence and coding benchmarks are notably weak for a 49B parameter model, limiting its usefulness for demanding business tasks. It may suit lightweight inference workloads where cost is the primary concern.

Assessed April 23, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence3.9Technical2.8Value7.8Content3.5
Intelligence 3.9/10
Technical 2.8/10
Content 3.5/10
Value 7.8/10

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and..

49B Parameters

Capabilities

Tool Use Function Calling

Performance Indices

Source: Artificial Analysis

18.5 Intelligence Index
9.4 Coding Index
26.9 Agentic Index
54.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 64.3% Graduate-level scientific reasoning
HLE 6.5% Humanity's Last Exam
MMLU Pro 78.5% Multi-task language understanding
MATH 500 95.9% Mathematical problem-solving
AIME 58.3% Competition mathematics
AIME 2025 54.7% Competition mathematics (2025)
SciCode 28.2% Scientific computing

Technical

LiveCodeBench 27.7% Live coding evaluation
τ²-Bench 26.9% Conversational agent benchmark

Content

IFBench 38.1% Instruction following
LCR 17% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 stack up?

Compare side-by-side with other efficient models.

Compare Models

Model Information

OpenRouter ID nvidia/llama-3.3-nemotron-super-49b-v1.5
Providernvidia
Release Date October 10, 2025
Context Length131,072 tokens
Max Completion16,384 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.10 $0.000100
Output $0.40 $0.000400

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
404ms
Best Latency (TTFT)
42 tok/s
Best Throughput
1/1
Active Endpoints
Available via: DeepInfra

Leaderboard Categories