Meta: Llama 3.2 11B Vision Instruct

Meta: Llama 3.2 11B Vision Instruct

meta-llama · Released Sep 25, 2024 Efficient
33.7
Our Score

Performance Profile

Intelligence1.9Technical0.8Value8Content3.5
Intelligence 1.9/10
Technical 0.8/10
Content 3.5/10
Value 8/10

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and..

$0.25 / 1M
Input Price
$0.25 / 1M
Output Price
131,072 tokens
Context Window
16,384 tokens
Max Output
11B Parameters

Capabilities

Vision

Architecture

ModalityText + Image → Text
TokenizerLlama3
Instruct Typellama3
Parameters11B

Performance Indices

Source: Artificial Analysis

8.7 Intelligence Index
4.3 Coding Index
7.7 Agentic Index
1.7 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 22.1% Graduate-level scientific reasoning
HLE 5.2% Humanity's Last Exam
MMLU Pro 46.4% Multi-task language understanding
MATH 500 51.6% Mathematical problem-solving
AIME 9.3% Competition mathematics
AIME 2025 1.7% Competition mathematics (2025)
SciCode 11.2% Scientific computing

Technical

LiveCodeBench 11% Live coding evaluation
TerminalBench Hard 0.8% Agentic terminal tasks
τ²-Bench 14.6% Conversational agent benchmark

Content

IFBench 30.4% Instruction following
LCR 11.7% Long-context reasoning

Benchmark data from Artificial Analysis and Hugging Face

How does Meta: Llama 3.2 11B Vision Instruct stack up?

Compare side-by-side with other efficient models.

Compare Models

Model Information

OpenRouter ID meta-llama/llama-3.2-11b-vision-instruct
Providermeta-llama
Model FamilyLlama 3
Release Date September 25, 2024
Context Length131,072 tokens
Max Completion16,384 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.25 $0.000245
Output $0.25 $0.000245

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
172ms
Best Latency (TTFT)
41 tok/s
Best Throughput
1/1
Active Endpoints
Available via: DeepInfra