Home > AI Models > NVIDIA: Llama 3.1 Nemotron 70B Instruct

NVIDIA: Llama 3.1 Nemotron 70B Instruct

Name: NVIDIA: Llama 3.1 Nemotron 70B Instruct Review
Item: NVIDIA: Llama 3.1 Nemotron 70B Instruct
Rating: 3.9
Author: Design for Online

NVIDIA: Llama 3.1 Nemotron 70B Instruct

nvidia · Released Oct 15, 2024 Efficient

Intelligence #219 / 557

38.8 Our Score

Speed #10 / 259

293.8 tokens / sec

Input #441 / 560

$1.20 per 1M tokens

Output #326 / 560

$1.20 per 1M tokens

Context #222 / 560

131,072 tokens

NVIDIA: Llama 3.1 Nemotron 70B Instruct sits in the Efficient tier on our leaderboard, ranked #219 of 557 published models on overall intelligence. At $1.20 input and $1.20 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports tool use and function calling.

Editorial notes

NVIDIA Llama 3.1 Nemotron 70B delivers above-average MMLU Pro and GPQA scores with tool use and function calling, but is priced at $1.20 per million tokens and benchmarks well below current frontier models.

Assessed May 14, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Reasoning: No
Input
Output
Context: 131,072 tokens
Max output: 16,384 tokens
Tokenizer: Llama3
Released: Oct 15, 2024

Modality data from OpenRouter; may understate provider-native audio/video/image output.

Performance Profile

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinforcement Learning from Human Feedback (RLHF), it excels..

70B Parameters

Capabilities

Tool Use Function Calling

Architecture Detail

Instruct Type llama3

Performance Indices

Source: Artificial Analysis

13.4 Intelligence Index

10.8 Coding Index

13.8 Agentic Index

11 Math Index

Benchmark Scores

GPQA Diamond 46.5% Graduate-level scientific reasoning

HLE 4.6% Humanity's Last Exam

MMLU Pro 69% Multi-task language understanding

MATH 500 73.3% Mathematical problem-solving

AIME 24.7% Competition mathematics

AIME 2025 11% Competition mathematics (2025)

SciCode 23.3% Scientific computing

LiveCodeBench 16.9% Live coding evaluation

TerminalBench Hard 4.5% Agentic terminal tasks

τ²-Bench 23.1% Conversational agent benchmark

IFBench 30.7% Instruction following

LCR 7% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does NVIDIA: Llama 3.1 Nemotron 70B Instruct stack up?

Compare side-by-side with other efficient models.

Compare Models

Model Information

OpenRouter ID	`nvidia/llama-3.1-nemotron-70b-instruct`
Provider	nvidia
Release Date	October 15, 2024
Context Length	131,072 tokens
Max Completion	16,384 tokens
Status	Active

Pricing

Token Type	Cost per 1M tokens	Cost per 1K tokens
Input	$1.20	$0.001200
Output	$1.20	$0.001200

Leaderboard Categories

Tool Use

External Resources

View on OpenRouter API access, playground, and provider details

API Quickstart Sample code and integration guide

NVIDIA: Llama 3.1 Nemotron 70B Instruct

NVIDIA: Llama 3.1 Nemotron 70B Instruct

Analysis Summary

Performance Profile

Capabilities

Architecture Detail

Performance Indices

Benchmark Scores

Intelligence

Technical

Content

Model Information

Pricing

Leaderboard Categories

External Resources

NVIDIA: Llama 3.1 Nemotron 70B Instruct

Performance Profile

Capabilities

Architecture Detail

Performance Indices

Benchmark Scores

Intelligence

Technical

Content

Model Information

Pricing

Leaderboard Categories

External Resources

Explore Related Models