Qwen: Qwen3 VL 8B Instruct

Qwen: Qwen3 VL 8B Instruct

qwen · Released Oct 14, 2025
32
Our Score

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions.

$0.08 / 1M Input Price
$0.50 / 1M Output Price
131,072 tokens Context Window
32,768 tokens Max Output
8B Parameters

Capabilities

Tool Use Function Calling Vision

Architecture

ModalityText + Image → Text
TokenizerQwen3
Parameters8B

Performance Indices

Source: Artificial Analysis

14.3 Intelligence Index
7.3 Coding Index
15.8 Agentic Index
27.3 Math Index

Benchmark Scores

Evaluations

GPQA Diamond 42.7%
Graduate-level scientific reasoning
HLE 2.9%
Humanity's Last Exam
MMLU Pro 68.6%
Multi-task language understanding
LiveCodeBench 33.2%
Live coding evaluation
SciCode 17.4%
Scientific computing
AIME 2025 27.3%
Competition mathematics (2025)
IFBench 32.3%
Instruction following
LCR 15.3%
Long-context reasoning
TerminalBench Hard 2.3%
Agentic terminal tasks
τ²-Bench 29.2%
Conversational agent benchmark

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID qwen/qwen3-vl-8b-instruct
Providerqwen
Release Date October 14, 2025
Context Length131,072 tokens
Max Completion32,768 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.08 $0.000080
Output $0.50 $0.000500

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

96.1%
Avg Uptime
332ms
Best Latency (TTFT)
69 tok/s
Best Throughput
4/4
Active Endpoints
Available via: Novita, Alibaba, Together, Parasail