Meta: Llama 3.2 11B Vision Instruct

Meta: Llama 3.2 11B Vision Instruct

meta-llama · Released Sep 25, 2024 Efficient
Intelligence #260 / 544
32.3 Our Score
Speed #143 / 252
86.3 tokens / sec
Input #282 / 544
$0.245 per 1M tokens
Output #185 / 544
$0.245 per 1M tokens
Context #202 / 544
131,072 tokens

Analysis Summary

Meta: Llama 3.2 11B Vision Instruct sits in the Efficient tier on our leaderboard, ranked #260 of 544 published models on overall intelligence. At $0.245 input and $0.245 output per 1M tokens, it is among the most expensive on the market. It offers a standard large context window and supports vision.

Editorial notes

Llama 3.2 11B Vision Instruct from Meta adds multimodal vision at low cost but benchmarks show very limited reasoning and coding capability, restricting it to simple visual tasks.

Assessed May 5, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence1.9Technical0.8Value7.8Content3
Intelligence 1.9/10
Technical 0.8/10
Content 3/10
Value 7.8/10

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and..

11B Parameters

Capabilities

Vision

Architecture Detail

Instruct Typellama3

Performance Indices

Source: Artificial Analysis

8.7 Intelligence Index
4.3 Coding Index
7.7 Agentic Index
1.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 22.1% Graduate-level scientific reasoning
HLE 5.2% Humanity's Last Exam
MMLU Pro 46.4% Multi-task language understanding
MATH 500 51.6% Mathematical problem-solving
AIME 9.3% Competition mathematics
AIME 2025 1.7% Competition mathematics (2025)
SciCode 11.2% Scientific computing

Technical

LiveCodeBench 11% Live coding evaluation
TerminalBench Hard 0.8% Agentic terminal tasks
τ²-Bench 14.6% Conversational agent benchmark

Content

IFBench 30.4% Instruction following
LCR 11.7% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does Meta: Llama 3.2 11B Vision Instruct stack up?

Compare side-by-side with other efficient models.

Compare Models

Model Information

OpenRouter ID meta-llama/llama-3.2-11b-vision-instruct
Providermeta-llama
Model FamilyLlama 3
Release Date September 25, 2024
Context Length131,072 tokens
Max Completion16,384 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.25 $0.000245
Output $0.25 $0.000245

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
456ms
Best Latency (TTFT)
21 tok/s
Best Throughput
1/1
Active Endpoints
Available via: DeepInfra