Qwen: Qwen2.5-VL 7B Instruct
Review
Qwen's 7B vision-language model offers multimodal input at a very low price point, but with no published benchmark scores its capability ceiling is uncertain, making it better suited to lightweight image-understanding tasks than demanding business workflows.
Assessment date: April 4, 2026
Our methodology takes into account a range of factors including pricing, functionality, capabilities, benchmark performance, and real-world applicability. Rankings are reviewed and updated regularly as new models are released. Issues with our rankings? Contact us
Performance Profile
Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.
Capabilities
Architecture
| Modality | Text + Image → Text |
| Tokenizer | Qwen |
| Parameters | 7B |
How does Qwen: Qwen2.5-VL 7B Instruct stack up?
Compare side-by-side with other legacy models.
Model Information
| OpenRouter ID |
qwen/qwen-2.5-vl-7b-instruct
|
| Provider | qwen |
| Release Date | August 28, 2024 |
| Context Length | 32,768 tokens |
| Status | Active |
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.20 | $0.000200 |
| Output | $0.20 | $0.000200 |
Leaderboard Categories
External Resources
Explore Related Models
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: April 14, 2026 8:52 pm