Step3 VL 10B

Step3 VL 10B

StepFun · Released Jan 20, 2026 Emerging
Intelligence #443 / 525
18.9 Our Score
AA Index #205 / 353
15.4 Artificial Analysis
Input
Not priced
Output
Not priced
Context
Not reported

Analysis Summary

Step3 VL 10B sits in the Emerging tier on our leaderboard, ranked #443 of 525 published models on overall intelligence. At $0.000 input and $0.000 output per 1M tokens, it is among the most expensive on the market.

Editorial notes

Step3 VL 10B from StepFun is a compact vision-language model with limited general reasoning and agentic capability, best suited to lightweight multimodal inference tasks.

Assessed April 26, 2026

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence3.3Technical1.8Value0Content2.5
Intelligence 3.3/10
Technical 1.8/10
Content 2.5/10
Value 0/10

Performance Indices

Source: Artificial Analysis

15.4 Intelligence Index
13.9 Coding Index
10.7 Agentic Index

Benchmark Scores

Intelligence

GPQA Diamond 69% Graduate-level scientific reasoning
HLE 10.2% Humanity's Last Exam
SciCode 31.1% Scientific computing

Technical

TerminalBench Hard 5.3% Agentic terminal tasks
τ²-Bench 16.1% Conversational agent benchmark

Content

IFBench 50.2% Instruction following

Benchmark data from Artificial Analysis and Hugging Face

How does Step3 VL 10B stack up?

Compare side-by-side with other emerging models.

Compare Models

Model Information

ProviderStepFun
Release Date January 20, 2026
Status Active