Qwen3 4B 2507 Instruct

Qwen3 4B 2507 Instruct

Alibaba · Released Aug 6, 2025 Emerging
Intelligence #470 / 561
18.9 Our Score
AA Index #256 / 368
12.9 Artificial Analysis
Input
Not priced
Output
Not priced
Context
Not reported

Analysis Summary

Qwen3 4B 2507 Instruct sits in the Emerging tier on our leaderboard, ranked #470 of 561 published models on overall intelligence. At $0.000 input and $0.000 output per 1M tokens, it is among the most expensive on the market.

Editorial notes

Qwen3 4B 2507 Instruct shows strong math and livecodebench scores for a 4B model but limited agentic and long-context reliability, suitable for lightweight structured tasks only.

Assessed May 14, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence2.9Technical1.9Value0Content3
Intelligence 2.9/10
Technical 1.9/10
Content 3/10
Value 0/10

Performance Indices

Source: Artificial Analysis

12.9 Intelligence Index
9 Coding Index
15.6 Agentic Index
52.3 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 51.7% Graduate-level scientific reasoning
HLE 4.7% Humanity's Last Exam
MMLU Pro 67.2% Multi-task language understanding
AIME 2025 52.3% Competition mathematics (2025)
SciCode 18.1% Scientific computing

Technical

LiveCodeBench 37.7% Live coding evaluation
TerminalBench Hard 4.5% Agentic terminal tasks
τ²-Bench 26.6% Conversational agent benchmark

Content

IFBench 33.5% Instruction following
LCR 7.3% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does Qwen3 4B 2507 Instruct stack up?

Compare side-by-side with other emerging models.

Compare Models

Model Information

ProviderAlibaba
Release Date August 6, 2025
Status Active