NVIDIA: Llama 3.1 Nemotron 70B Instruct

NVIDIA: Llama 3.1 Nemotron 70B Instruct

nvidia · Released Oct 15, 2024 Efficient
Intelligence #219 / 557
38.8 Our Score
Speed #10 / 259
293.8 tokens / sec
Input #441 / 560
$1.20 per 1M tokens
Output #326 / 560
$1.20 per 1M tokens
Context #222 / 560
131,072 tokens

Analysis Summary

NVIDIA: Llama 3.1 Nemotron 70B Instruct sits in the Efficient tier on our leaderboard, ranked #219 of 557 published models on overall intelligence. At $1.20 input and $1.20 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports tool use and function calling.

Editorial notes

NVIDIA Llama 3.1 Nemotron 70B delivers above-average MMLU Pro and GPQA scores with tool use and function calling, but is priced at $1.20 per million tokens and benchmarks well below current frontier models.

Assessed May 14, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence2.9Technical2Value6.8Content4.5
Intelligence 2.9/10
Technical 2/10
Content 4.5/10
Value 6.8/10

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinforcement Learning from Human Feedback (RLHF), it excels..

70B Parameters

Capabilities

Tool Use Function Calling

Architecture Detail

Instruct Typellama3

Performance Indices

Source: Artificial Analysis

13.4 Intelligence Index
10.8 Coding Index
13.8 Agentic Index
11 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 46.5% Graduate-level scientific reasoning
HLE 4.6% Humanity's Last Exam
MMLU Pro 69% Multi-task language understanding
MATH 500 73.3% Mathematical problem-solving
AIME 24.7% Competition mathematics
AIME 2025 11% Competition mathematics (2025)
SciCode 23.3% Scientific computing

Technical

LiveCodeBench 16.9% Live coding evaluation
TerminalBench Hard 4.5% Agentic terminal tasks
τ²-Bench 23.1% Conversational agent benchmark

Content

IFBench 30.7% Instruction following
LCR 7% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does NVIDIA: Llama 3.1 Nemotron 70B Instruct stack up?

Compare side-by-side with other efficient models.

Compare Models

Model Information

OpenRouter ID nvidia/llama-3.1-nemotron-70b-instruct
Providernvidia
Release Date October 15, 2024
Context Length131,072 tokens
Max Completion16,384 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $1.20 $0.001200
Output $1.20 $0.001200

Leaderboard Categories