Hermes 4 – Llama-3.1 405B (Reasoning)

Hermes 4 – Llama-3.1 405B (Reasoning)

Nous Research · Released Aug 27, 2025 Professional
Intelligence #10 / 576
82.0 Our Score
Speed #246 / 271
39.1 tokens / sec
Input #433 / 577
$1.00 per 1M tokens
Output #429 / 577
$3.00 per 1M tokens
Context
— Not reported

Analysis Summary

Hermes 4 on Llama-3.1 405B (Reasoning) is Nous Research's extended-thinking variant of its large fine-tune, and the benchmark uplift from enabling reasoning is clear. Its math index of 69.7, GPQA of 0.727, and LiveCodeBench of 0.686 all represent a meaningful step up from the non-reasoning sibling, and its MMLU-Pro of 0.829 is competitive in the mid-tier.

For businesses, this model suits tasks that benefit from deliberate, multi-step reasoning: complex analysis, technical writing, and coding assistance. Its agentic index remains limited, so it is not a strong fit for autonomous agent workflows, but it can handle harder single-turn tasks well. Pricing at $1/$3 per million tokens is moderate for a 405B reasoning model.

Teams needing a capable reasoning model for analytical or coding tasks, without moving to frontier pricing, will find this a practical option, particularly where the math and science benchmarks are relevant to the workload.

Assessed June 6, 2026

Editorial notes

Hermes 4 Llama-3.1 405B (Reasoning) from Nous Research shows strong math and GPQA scores with a LiveCodeBench of 0.686, making it a capable reasoning-mode option at $1/$3 per million tokens.

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence3.6Technical2.7Value6Content3
Intelligence 3.6/10
Technical 2.7/10
Content 3/10
Value 6/10

How Hermes 4 – Llama-3.1 405B (Reasoning) compares

Hermes 4 – Llama-3.1 405B (Reasoning) ranks #188 of 378 AI models we track for overall intelligence, #170 of 315 for coding, #199 of 289 for agentic tasks. At $1.00 per million input tokens it is cheaper than 25% of comparable models.

Performance Indices

Source: Artificial Analysis

18.6 Intelligence Index
16 Coding Index
16.8 Agentic Index
69.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 72.7% Graduate-level scientific reasoning
HLE 10.3% Humanity's Last Exam
MMLU Pro 82.9% Multi-task language understanding
AIME 2025 69.7% Competition mathematics (2025)
SciCode 25.2% Scientific computing

Technical

LiveCodeBench 68.6% Live coding evaluation
TerminalBench Hard 11.4% Agentic terminal tasks
τ²-Bench 22.2% Conversational agent benchmark

Content

IFBench 32.7% Instruction following
LCR 20.7% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does Hermes 4 – Llama-3.1 405B (Reasoning) stack up?

Compare side-by-side with other professional models.

Compare Models

Model Information

ProviderNous Research
Release Date August 27, 2025
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $1.00 $0.001000
Output $3.00 $0.003000

Leaderboard Categories

Frequently asked questions about Hermes 4 – Llama-3.1 405B (Reasoning)

How much does Hermes 4 – Llama-3.1 405B (Reasoning) cost?

Hermes 4 – Llama-3.1 405B (Reasoning) costs $1.00 per million input tokens and $3.00 per million output tokens.

Is Hermes 4 – Llama-3.1 405B (Reasoning) good for coding?

On our coding benchmark index, Hermes 4 – Llama-3.1 405B (Reasoning) ranks #170 of 315 models, placing it in the broader range of the field for code generation and debugging.

Who created Hermes 4 – Llama-3.1 405B (Reasoning)?

Hermes 4 – Llama-3.1 405B (Reasoning) is developed by Nous Research and was released on August 27, 2025.