Qwen: Qwen3 VL 235B A22B Instruct
Qwen3 VL 235B A22B Instruct offers multimodal vision support and a large 262K context window at a competitive price, but its intelligence index is limited, making it better suited to straightforward content and image-understanding tasks.
Assessment date: March 12, 2026
Our methodology takes into account a range of factors including pricing, functionality, capabilities, benchmark performance, and real-world applicability. Rankings are reviewed and updated regularly as new models are released. Issues with our rankings? Contact us
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.
Capabilities
Architecture
| Modality | Text + Image → Text |
| Tokenizer | Qwen3 |
| Parameters | 235B |
Performance Indices
Source: Artificial Analysis
Benchmark Scores
Evaluations
Benchmark data from Artificial Analysis and Hugging Face
Model Information
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.20 | $0.000200 |
| Output | $0.88 | $0.000880 |
Live Performance
Live endpoint metrics — refreshed every 30 minutes.
Leaderboard Categories
External Resources
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: March 13, 2026 7:52 pm