NVIDIA: Llama 3.1 Nemotron 70B Instruct
Analysis Summary
NVIDIA: Llama 3.1 Nemotron 70B Instruct sits in the Efficient tier on our leaderboard, ranked #219 of 557 published models on overall intelligence. At $1.20 input and $1.20 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports tool use and function calling.
Editorial notes
NVIDIA Llama 3.1 Nemotron 70B delivers above-average MMLU Pro and GPQA scores with tool use and function calling, but is priced at $1.20 per million tokens and benchmarks well below current frontier models.
Assessed May 14, 2026
Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?
Performance Profile
NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging Llama 3.1 70B architecture and Reinforcement Learning from Human Feedback (RLHF), it excels..
Capabilities
Architecture Detail
| Instruct Type | llama3 |
Performance Indices
Source: Artificial Analysis
Benchmark Scores
Intelligence
Technical
Content
Benchmark data from Artificial Analysis and Hugging Face
How does NVIDIA: Llama 3.1 Nemotron 70B Instruct stack up?
Compare side-by-side with other efficient models.
Model Information
| OpenRouter ID |
nvidia/llama-3.1-nemotron-70b-instruct
|
| Provider | nvidia |
| Release Date | October 15, 2024 |
| Context Length | 131,072 tokens |
| Max Completion | 16,384 tokens |
| Status | Active |
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $1.20 | $0.001200 |
| Output | $1.20 | $0.001200 |
Leaderboard Categories
External Resources
Explore Related Models
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: May 20, 2026 8:38 pm