Qwen: Qwen2.5-VL 7B Instruct

Qwen: Qwen2.5-VL 7B Instruct

qwen · Released Aug 28, 2024 Legacy
Awaiting
Review
Benchmarks pending

Performance Profile

Intelligence0Technical0Value7.5Content4
Intelligence 0/10
Technical 0/10
Content 4/10
Value 7.5/10

Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this blog post and GitHub repo. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

$0.20 / 1M
Input Price
$0.20 / 1M
Output Price
32,768 tokens
Context Window
7B Parameters

Capabilities

Vision

Architecture

ModalityText + Image → Text
TokenizerQwen
Parameters7B

How does Qwen: Qwen2.5-VL 7B Instruct stack up?

Compare side-by-side with other legacy models.

Compare Models

Model Information

OpenRouter ID qwen/qwen-2.5-vl-7b-instruct
Providerqwen
Release Date August 28, 2024
Context Length32,768 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.20 $0.000200
Output $0.20 $0.000200

Leaderboard Categories