Meta: Llama 3.2 11B Vision Instruct
Analysis Summary
Meta: Llama 3.2 11B Vision Instruct sits in the Efficient tier on our leaderboard, ranked #260 of 544 published models on overall intelligence. At $0.245 input and $0.245 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports vision.
Editorial notes
Llama 3.2 11B Vision Instruct from Meta adds multimodal vision at low cost but benchmarks show very limited reasoning and coding capability, restricting it to simple visual tasks.
Assessed May 5, 2026
Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?
Performance Profile
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and..
Capabilities
Architecture Detail
| Instruct Type | llama3 |
Performance Indices
Source: Artificial Analysis
Benchmark Scores
Intelligence
Technical
Content
Benchmark data from Artificial Analysis and Hugging Face
How does Meta: Llama 3.2 11B Vision Instruct stack up?
Compare side-by-side with other efficient models.
Model Information
| OpenRouter ID |
meta-llama/llama-3.2-11b-vision-instruct
|
| Provider | meta-llama |
| Model Family | Llama 3 |
| Release Date | September 25, 2024 |
| Context Length | 131,072 tokens |
| Max Completion | 16,384 tokens |
| Status | Active |
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.25 | $0.000245 |
| Output | $0.25 | $0.000245 |
Live Performance
Live endpoint metrics — refreshed every 30 minutes.
External Resources
Explore Related Models
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: May 5, 2026 11:06 am