Hermes 4 – Llama-3.1 70B (Reasoning)

Hermes 4 – Llama-3.1 70B (Reasoning)

Nous Research · Released Aug 27, 2025 Professional
Intelligence #10 / 576
82.0 Our Score
Speed #134 / 271
95.5 tokens / sec
Input #235 / 577
$0.130 per 1M tokens
Output #226 / 577
$0.400 per 1M tokens
Context
— Not reported

Analysis Summary

Hermes 4 on Llama-3.1 70B (Reasoning) is the extended-thinking variant of Nous Research's 70B fine-tune, and the reasoning mode delivers a substantial uplift on math and coding benchmarks relative to its non-reasoning sibling. A math index of 68.7, GPQA of 0.699, and LiveCodeBench of 0.653 are strong for a 70B-class model, and MMLU-Pro of 0.811 is competitive.

For businesses, this model offers a cost-efficient path to reasoning-capable performance. At $0.13/$0.40 per million tokens, it is among the more affordable options for tasks requiring deliberate multi-step thinking. Its agentic index is low, limiting its use in tool-calling or autonomous workflows, and long-context reliability is weak.

Teams with budget constraints who need reasoning capability for analytical, math-heavy, or coding tasks will find this a strong value proposition, though the 405B reasoning variant offers additional headroom for the most demanding workloads.

Assessed June 6, 2026

Editorial notes

Hermes 4 Llama-3.1 70B (Reasoning) from Nous Research delivers strong math performance and a LiveCodeBench of 0.653 at a very low price of $0.13/$0.40 per million tokens.

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence3.2Technical2.3Value7Content2.1
Intelligence 3.2/10
Technical 2.3/10
Content 2.1/10
Value 7/10

How Hermes 4 – Llama-3.1 70B (Reasoning) compares

Hermes 4 – Llama-3.1 70B (Reasoning) ranks #215 of 378 AI models we track for overall intelligence, #188 of 315 for coding, #233 of 289 for agentic tasks. At $0.13 per million input tokens it is cheaper than 59% of comparable models.

Performance Indices

Source: Artificial Analysis

16 Intelligence Index
14.4 Coding Index
13.5 Agentic Index
68.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 69.9% Graduate-level scientific reasoning
HLE 7.9% Humanity's Last Exam
MMLU Pro 81.1% Multi-task language understanding
AIME 2025 68.7% Competition mathematics (2025)
SciCode 34.1% Scientific computing

Technical

LiveCodeBench 65.3% Live coding evaluation
TerminalBench Hard 4.5% Agentic terminal tasks
τ²-Bench 22.5% Conversational agent benchmark

Content

IFBench 31.3% Instruction following
LCR 6.7% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does Hermes 4 – Llama-3.1 70B (Reasoning) stack up?

Compare side-by-side with other professional models.

Compare Models

Model Information

ProviderNous Research
Release Date August 27, 2025
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.13 $0.000130
Output $0.40 $0.000400

Leaderboard Categories

Frequently asked questions about Hermes 4 – Llama-3.1 70B (Reasoning)

How much does Hermes 4 – Llama-3.1 70B (Reasoning) cost?

Hermes 4 – Llama-3.1 70B (Reasoning) costs $0.13 per million input tokens and $0.40 per million output tokens.

Is Hermes 4 – Llama-3.1 70B (Reasoning) good for coding?

On our coding benchmark index, Hermes 4 – Llama-3.1 70B (Reasoning) ranks #188 of 315 models, placing it in the broader range of the field for code generation and debugging.

Who created Hermes 4 – Llama-3.1 70B (Reasoning)?

Hermes 4 – Llama-3.1 70B (Reasoning) is developed by Nous Research and was released on August 27, 2025.