OpenGVLab: InternVL3 78B
OpenGVLab's InternVL3 78B supports vision and multimodal input but has no benchmark data available, and its small 32K context window is a limitation for business document workflows; without performance evidence it cannot be scored above the conservative range.
Assessment date: March 14, 2026
Our methodology takes into account a range of factors including pricing, functionality, capabilities, benchmark performance, and real-world applicability. Rankings are reviewed and updated regularly as new models are released. Issues with our rankings? Contact us
The InternVL3 series is an advanced multimodal large language model (MLLM). Compared to InternVL 2.5, InternVL3 demonstrates stronger multimodal perception and reasoning capabilities. In addition, InternVL3 is benchmarked against the Qwen2.5 Chat models, whose pre-trained base models serve as the initialization for its language component. Benefiting from Native Multimodal Pre-Training, the InternVL3 series surpasses the Qwen2.5 series in overall text performance.
Capabilities
Architecture
| Modality | Text + Image → Text |
| Tokenizer | Other |
| Parameters | 78B |
Model Information
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.15 | $0.000150 |
| Output | $0.60 | $0.000600 |
External Resources
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: March 15, 2026 7:52 pm