Baidu: ERNIE 4.5 VL 28B A3B
ERNIE 4.5 VL 28B A3B adds vision capability but lacks benchmark data and has a very small 30K context window, limiting its utility for document-heavy workflows; limited availability for Western businesses further constrains its practical appeal.
Assessment date: March 12, 2026
Our methodology takes into account a range of factors including pricing, functionality, capabilities, benchmark performance, and real-world applicability. Rankings are reviewed and updated regularly as new models are released. Issues with our rankings? Contact us
A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities.
Capabilities
Architecture
| Modality | Text + Image → Text |
| Tokenizer | Other |
| Parameters | 28B |
Model Information
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.14 | $0.000140 |
| Output | $0.56 | $0.000560 |
External Resources
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: March 13, 2026 7:52 pm