Meta: Llama 3.2 11B Vision Instruct

Meta: Llama 3.2 11B Vision Instruct

meta-llama · Released Sep 25, 2024
33
Our Score

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the original model card. Usage of this model is subject to Meta's Acceptable Use Policy.

$0.05 / 1M Input Price
$0.05 / 1M Output Price
131,072 tokens Context Window
16,384 tokens Max Output
11B Parameters

Capabilities

Vision

Architecture

ModalityText + Image → Text
TokenizerLlama3
Instruct Typellama3
Parameters11B

Performance Indices

Source: Artificial Analysis

8.7 Intelligence Index
4.3 Coding Index
7.7 Agentic Index
1.7 Math Index

Benchmark Scores

Evaluations

GPQA Diamond 22.1%
Graduate-level scientific reasoning
HLE 5.2%
Humanity's Last Exam
MMLU Pro 46.4%
Multi-task language understanding
LiveCodeBench 11%
Live coding evaluation
SciCode 11.2%
Scientific computing
MATH 500 51.6%
Mathematical problem-solving
AIME 9.3%
Competition mathematics
AIME 2025 1.7%
Competition mathematics (2025)
IFBench 30.4%
Instruction following
LCR 11.7%
Long-context reasoning
TerminalBench Hard 0.8%
Agentic terminal tasks
τ²-Bench 14.6%
Conversational agent benchmark

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID meta-llama/llama-3.2-11b-vision-instruct
Providermeta-llama
Model FamilyLlama 3
Release Date September 25, 2024
Context Length131,072 tokens
Max Completion16,384 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.05 $0.000049
Output $0.05 $0.000049

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
271ms
Best Latency (TTFT)
28 tok/s
Best Throughput
1/2
Active Endpoints
Available via: DeepInfra, Cloudflare