Home > AI Models > NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Name: NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 Review
Item: NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
Rating: 4.5
Author: Design for Online

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

nvidia · Released Oct 10, 2025 Efficient

Intelligence #177 / 557

44.8 Our Score

Speed #157 / 259

67.6 tokens / sec

Input #192 / 560

$0.100 per 1M tokens

Output #219 / 560

$0.400 per 1M tokens

Context #222 / 560

131,072 tokens

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 sits in the Efficient tier on our leaderboard, ranked #177 of 557 published models on overall intelligence. At $0.100 input and $0.400 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports tool use, function calling, and reasoning.

Editorial notes

NVIDIA Llama 3.3 Nemotron Super 49B V1.5 offers very low pricing and tool use, but its intelligence and coding indices are limited, with a particularly weak coding score of 9.4.

Assessed May 14, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Reasoning: Yes
Input
Output
Context: 131,072 tokens
Max output: 16,384 tokens
Tokenizer: Llama3
Released: Oct 10, 2025

Modality data from OpenRouter; may understate provider-native audio/video/image output.

Performance Profile

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and..

49B Parameters

Capabilities

Tool Use Function Calling

Performance Indices

Source: Artificial Analysis

18.5 Intelligence Index

9.4 Coding Index

26.9 Agentic Index

54.7 Math Index

Benchmark Scores

GPQA Diamond 64.3% Graduate-level scientific reasoning

HLE 6.5% Humanity's Last Exam

MMLU Pro 78.5% Multi-task language understanding

MATH 500 95.9% Mathematical problem-solving

AIME 58.3% Competition mathematics

AIME 2025 54.7% Competition mathematics (2025)

SciCode 28.2% Scientific computing

LiveCodeBench 27.7% Live coding evaluation

τ²-Bench 26.9% Conversational agent benchmark

IFBench 38.1% Instruction following

LCR 17% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 stack up?

Compare side-by-side with other efficient models.

Compare Models

Model Information

OpenRouter ID	`nvidia/llama-3.3-nemotron-super-49b-v1.5`
Provider	nvidia
Release Date	October 10, 2025
Context Length	131,072 tokens
Max Completion	16,384 tokens
Status	Active

Pricing

Token Type	Cost per 1M tokens	Cost per 1K tokens
Input	$0.10	$0.000100
Output	$0.40	$0.000400

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

147ms

Best Latency (TTFT)

48 tok/s

Best Throughput

0/1

Active Endpoints

Available via: DeepInfra

Leaderboard Categories

Tool Use

External Resources

View on OpenRouter API access, playground, and provider details

API Quickstart Sample code and integration guide

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Analysis Summary

Performance Profile

Capabilities

Performance Indices

Benchmark Scores

Intelligence

Technical

Content

Model Information

Pricing

Live Performance

Leaderboard Categories

External Resources

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Performance Profile

Capabilities

Performance Indices

Benchmark Scores

Intelligence

Technical

Content

Model Information

Pricing

Live Performance

Leaderboard Categories

External Resources

Explore Related Models