Qwen3 VL 4B (Reasoning)

Qwen3 VL 4B (Reasoning)

Alibaba · Released Oct 14, 2025 Professional
Intelligence #10 / 576
82.0 Our Score
AA Index #257 / 378
13.7 Artificial Analysis
Input
— Not priced
Output
— Not priced
Context
— Not reported

Analysis Summary

Qwen3 VL 4B (Reasoning) is the thinking-mode variant of Alibaba's 4-billion-parameter vision-language model, released in October 2025. Its GPQA score of 0.494 and MMLU Pro of 0.700 are strong for a model of this size, and vision support adds multimodal utility. However, its coding index of 6.7 and agentic index of 8.5 are low, limiting its use in technical or autonomous workflows.

For businesses, it suits tasks where compact multimodal reasoning is needed: image-based content analysis, visual document understanding, or structured Q&A over mixed media. Its instruction-following (ifbench: 0.366) is reasonable, but long-context reliability is modest and terminal/agentic capability is minimal.

No pricing data is available. As a small reasoning-capable vision model, it offers a useful capability profile for teams needing multimodal reasoning on a budget, but it should not be the primary model for coding, agents, or high-stakes content generation.

Assessed June 6, 2026

Editorial notes

Qwen3 VL 4B (Reasoning) adds thinking-mode capability to Alibaba's compact vision-language model, improving GPQA and MMLU Pro scores but with limited coding and agentic performance for business workflows.

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence2.6Technical1.2Value0Content3.3
Intelligence 2.6/10
Technical 1.2/10
Content 3.3/10
Value 0/10

How Qwen3 VL 4B (Reasoning) compares

Qwen3 VL 4B (Reasoning) ranks #257 of 378 AI models we track for overall intelligence, #265 of 315 for coding, #265 of 289 for agentic tasks. Qwen3 VL 4B (Reasoning) is currently free to use via OpenRouter.

Performance Indices

Source: Artificial Analysis

13.7 Intelligence Index
6.7 Coding Index
8.5 Agentic Index
25.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 49.4% Graduate-level scientific reasoning
HLE 4.4% Humanity's Last Exam
MMLU Pro 70% Multi-task language understanding
AIME 2025 25.7% Competition mathematics (2025)
SciCode 17.1% Scientific computing

Technical

LiveCodeBench 32% Live coding evaluation
TerminalBench Hard 1.5% Agentic terminal tasks
τ²-Bench 15.5% Conversational agent benchmark

Content

IFBench 36.6% Instruction following
LCR 21.3% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does Qwen3 VL 4B (Reasoning) stack up?

Compare side-by-side with other professional models.

Compare Models

Model Information

ProviderAlibaba
Release Date October 14, 2025
Status Active

Leaderboard Categories

Frequently asked questions about Qwen3 VL 4B (Reasoning)

How much does Qwen3 VL 4B (Reasoning) cost?

Qwen3 VL 4B (Reasoning) is currently available for free via OpenRouter.

Is Qwen3 VL 4B (Reasoning) good for coding?

On our coding benchmark index, Qwen3 VL 4B (Reasoning) ranks #265 of 315 models, placing it in the broader range of the field for code generation and debugging.

Who created Qwen3 VL 4B (Reasoning)?

Qwen3 VL 4B (Reasoning) is developed by Alibaba and was released on October 14, 2025.