Qwen: Qwen2.5 VL 32B Instruct

Qwen: Qwen2.5 VL 32B Instruct

qwen · Released Mar 24, 2025 Efficient
Intelligence #264 / 583
33.6 Our Score
AA Index #271 / 380
7.5 Artificial Analysis
Input #274 / 583
$0.200 per 1M tokens
Output #272 / 583
$0.600 per 1M tokens
Context #335 / 583
128,000 tokens

Analysis Summary

Qwen2.5 VL 32B Instruct is a vision-language model from Qwen (Alibaba) with a 128K context window and strong multimodal focus. Its MMLU-Pro score of 0.697 and GPQA of 0.466 reflect moderate general reasoning, and its LiveCodeBench score of 0.248 suggests basic coding capability. Agentic and coding indices are not available, limiting full technical assessment.

For businesses, it is best suited to vision-enabled content tasks: image analysis, document understanding, and multimodal content generation. The absence of tool use or function calling data means it is less suited to structured agentic pipelines compared to peers with those capabilities confirmed.

At $0.20 input and $0.60 output per million tokens, pricing is competitive for a 32B vision-language model. Teams needing affordable multimodal processing for content or document workflows will find it a reasonable option, though provider accessibility may be a consideration for some enterprise environments.

Assessed June 17, 2026

Editorial notes

Qwen2.5 VL 32B Instruct from Qwen delivers vision-language capability with a 128K context window at low cost, suited to multimodal content tasks but with limited agentic benchmark data.

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence2Technical0Value7.8Content5
Intelligence 2/10
Technical 0/10
Content 5/10
Value 7.8/10

How Qwen: Qwen2.5 VL 32B Instruct compares

Qwen: Qwen2.5 VL 32B Instruct ranks #271 of 380 AI models we track for overall intelligence. Its 128K-token context window is larger than 43% of the models we list. At $0.20 per million input tokens it is cheaper than 53% of comparable models.

About Qwen: Qwen2.5 VL 32B Instruct

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual..

32B Parameters

Capabilities

Vision

Performance Indices

Source: Artificial Analysis

7.5 Intelligence Index

Benchmark Scores

Intelligence

GPQA Diamond 46.6% Graduate-level scientific reasoning
HLE 3.8% Humanity's Last Exam
MMLU Pro 69.7% Multi-task language understanding
MATH 500 80.5% Mathematical problem-solving
AIME 11% Competition mathematics
SciCode 22.9% Scientific computing

Technical

LiveCodeBench 24.8% Live coding evaluation

Benchmark data from Artificial Analysis and Hugging Face

How does Qwen: Qwen2.5 VL 32B Instruct stack up?

Compare side-by-side with other efficient models.

Compare Models

Model Information

OpenRouter ID qwen/qwen2.5-vl-32b-instruct
Providerqwen
Release Date March 24, 2025
Context Length128,000 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.20 $0.000200
Output $0.60 $0.000600

Leaderboard Categories

Frequently asked questions about Qwen: Qwen2.5 VL 32B Instruct

How much does Qwen: Qwen2.5 VL 32B Instruct cost?

Qwen: Qwen2.5 VL 32B Instruct costs $0.20 per million input tokens and $0.60 per million output tokens.

What is the context window of Qwen: Qwen2.5 VL 32B Instruct?

Qwen: Qwen2.5 VL 32B Instruct has a context window of 128,000 tokens (128K).

What can Qwen: Qwen2.5 VL 32B Instruct do?

Qwen: Qwen2.5 VL 32B Instruct supports image/vision input.

Who created Qwen: Qwen2.5 VL 32B Instruct?

Qwen: Qwen2.5 VL 32B Instruct is developed by Qwen and was released on March 24, 2025.