Qwen: Qwen3 VL 235B A22B Instruct

Qwen: Qwen3 VL 235B A22B Instruct

qwen · Released Sep 23, 2025
42
Our Score

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

$0.20 / 1M Input Price
$0.88 / 1M Output Price
262,144 tokens Context Window
235B Parameters

Capabilities

Tool Use Function Calling Vision

Architecture

ModalityText + Image → Text
TokenizerQwen3
Parameters235B

Performance Indices

Source: Artificial Analysis

20.8 Intelligence Index
16.5 Coding Index
20.9 Agentic Index
70.7 Math Index

Benchmark Scores

Evaluations

GPQA Diamond 71.2%
Graduate-level scientific reasoning
HLE 6.3%
Humanity's Last Exam
MMLU Pro 82.3%
Multi-task language understanding
LiveCodeBench 59.4%
Live coding evaluation
SciCode 35.9%
Scientific computing
AIME 2025 70.7%
Competition mathematics (2025)
IFBench 42.7%
Instruction following
LCR 31.7%
Long-context reasoning
TerminalBench Hard 6.8%
Agentic terminal tasks
τ²-Bench 35.1%
Conversational agent benchmark

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID qwen/qwen3-vl-235b-a22b-instruct
Providerqwen
Release Date September 23, 2025
Context Length262,144 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.20 $0.000200
Output $0.88 $0.000880

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

97.6%
Avg Uptime
1,015ms
Best Latency (TTFT)
53 tok/s
Best Throughput
5/6
Active Endpoints
Available via: DeepInfra, Parasail, Alibaba, AtlasCloud, Novita, SiliconFlow

Leaderboard Categories