Z.ai: GLM 4.6V

Z.ai: GLM 4.6V

z-ai · Released Dec 8, 2025
38
Our Score

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

$0.30 / 1M Input Price
$0.90 / 1M Output Price
131,072 tokens Context Window
131,072 tokens Max Output

Capabilities

Tool Use Function Calling Vision

Architecture

ModalityText + Image + Video → Text
TokenizerOther

Performance Indices

Source: Artificial Analysis

17.1 Intelligence Index
11.1 Coding Index
16.9 Agentic Index
26.3 Math Index

Benchmark Scores

Evaluations

GPQA Diamond 56.6%
Graduate-level scientific reasoning
HLE 3.7%
Humanity's Last Exam
MMLU Pro 75.2%
Multi-task language understanding
LiveCodeBench 41.1%
Live coding evaluation
SciCode 27.2%
Scientific computing
AIME 2025 26.3%
Competition mathematics (2025)
IFBench 27.9%
Instruction following
LCR 12.3%
Long-context reasoning
TerminalBench Hard 3%
Agentic terminal tasks
τ²-Bench 30.7%
Conversational agent benchmark

Benchmark data from Artificial Analysis and Hugging Face

Model Information

OpenRouter ID z-ai/glm-4.6v
Providerz-ai
Release Date December 8, 2025
Context Length131,072 tokens
Max Completion131,072 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.30 $0.000300
Output $0.90 $0.000900

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

1,180ms
Best Latency (TTFT)
76 tok/s
Best Throughput
0/5
Active Endpoints
Available via: SiliconFlow, DeepInfra, Chutes, Novita, Z.AI