Qwen3 4B 2507 (Reasoning)

Qwen3 4B 2507 (Reasoning)

Alibaba · Released Aug 6, 2025 Emerging
Intelligence #385 / 561
24.3 Our Score
AA Index #183 / 368
18.2 Artificial Analysis
Input
Not priced
Output
Not priced
Context
Not reported

Analysis Summary

Qwen3 4B 2507 (Reasoning) sits in the Emerging tier on our leaderboard, ranked #385 of 561 published models on overall intelligence. At $0.000 input and $0.000 output per 1M tokens, it is among the most expensive on the market.

Editorial notes

Qwen3 4B 2507 Reasoning is a compact model with exceptional math scores and strong livecodebench performance for its size, though overall intelligence and agentic capability remain limited.

Assessed May 14, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence3.8Technical2Value0Content4.5
Intelligence 3.8/10
Technical 2/10
Content 4.5/10
Value 0/10

Performance Indices

Source: Artificial Analysis

18.2 Intelligence Index
9.5 Coding Index
13.5 Agentic Index
82.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 66.7% Graduate-level scientific reasoning
HLE 5.9% Humanity's Last Exam
MMLU Pro 74.3% Multi-task language understanding
AIME 2025 82.7% Competition mathematics (2025)
SciCode 25.6% Scientific computing

Technical

LiveCodeBench 64.1% Live coding evaluation
TerminalBench Hard 1.5% Agentic terminal tasks
τ²-Bench 25.4% Conversational agent benchmark

Content

IFBench 49.8% Instruction following
LCR 37.7% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does Qwen3 4B 2507 (Reasoning) stack up?

Compare side-by-side with other emerging models.

Compare Models

Model Information

ProviderAlibaba
Release Date August 6, 2025
Status Active