Qwen: Qwen VL Plus
Qwen VL Plus supports vision and a 131K context window at a low price point, but without any benchmark data its performance cannot be verified. It may suit basic multimodal tasks but cannot be recommended for professional workflows without supporting evidence.
Assessment date: March 14, 2026
Our methodology takes into account a range of factors including pricing, functionality, capabilities, benchmark performance, and real-world applicability. Rankings are reviewed and updated regularly as new models are released. Issues with our rankings? Contact us
Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks.
Capabilities
Architecture
| Modality | Text + Image → Text |
| Tokenizer | Qwen |
Model Information
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.14 | $0.000137 |
| Output | $0.41 | $0.000410 |
Live Performance
Live endpoint metrics — refreshed every 30 minutes.
External Resources
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: March 15, 2026 7:52 pm