Qwen: Qwen3 VL 30B A3B Thinking
Qwen3 VL 30B A3B Thinking adds vision and thinking capabilities at a low price point, with strong math scores, but its intelligence index is in the limited range and agentic performance is weak; a regional accessibility consideration also applies for UK businesses.
Assessment date: March 14, 2026
Our methodology takes into account a range of factors including pricing, functionality, capabilities, benchmark performance, and real-world applicability. Rankings are reviewed and updated regularly as new models are released. Issues with our rankings? Contact us
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.
Capabilities
Architecture
| Modality | Text + Image → Text |
| Tokenizer | Qwen3 |
| Parameters | 30B |
Performance Indices
Source: Artificial Analysis
Benchmark Scores
Evaluations
Benchmark data from Artificial Analysis and Hugging Face
Model Information
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.13 | $0.000130 |
| Output | $1.56 | $0.001560 |
Live Performance
Live endpoint metrics — refreshed every 30 minutes.
External Resources
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: March 15, 2026 7:52 pm