StepFun: Step 3.7 Flash
Analysis Summary
StepFun Step 3.7 Flash is a multimodal model supporting text, image, and video input, with tool use and function calling enabled. Its intelligence index of 42.6 places it in the very strong tier, and its agentic index of 67.1 is a standout, it among the better agentic performers in the current landscape. The tau2 score of 0.985 suggests highly reliable task completion in structured agentic settings.
For businesses, the model is best positioned for agentic automation, multimodal content processing, and cost-sensitive workflows where near-frontier reasoning is sufficient. Its coding index of 37.1 limits its appeal for software engineering tasks, and instruction following (ifbench 0.67) is below the top tier, which may affect reliability in precise content generation. The 256K context window is adequate for most document tasks.
At $0.20 input and $1.15 output per million tokens, it is one of the more affordable multimodal agentic models available. Teams running high-volume agentic pipelines or multimodal processing at scale will find the cost-performance ratio attractive, provided they validate output quality for their specific use case.
Assessed June 6, 2026
Editorial notes
StepFun Step 3.7 Flash offers very strong agentic performance and multimodal input at a low price point, though reasoning and coding benchmarks sit in the mid-tier range.
Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?
Performance Profile
How StepFun: Step 3.7 Flash compares
StepFun: Step 3.7 Flash ranks #48 of 377 AI models we track for overall intelligence, #53 of 314 for coding, #22 of 289 for agentic tasks. Its 256K-token context window is larger than 71% of the models we list. At $0.20 per million input tokens it is cheaper than 53% of comparable models.
About StepFun: Step 3.7 Flash
Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters..
Capabilities
Performance Indices
Source: Artificial Analysis
This model was released recently. Independent benchmark evaluations are typically completed within days of release, so these figures are preliminary and are likely to be updated as testing is finalised.
Benchmark Scores
Intelligence
Technical
Content
Benchmark data from Artificial Analysis and Hugging Face
How does StepFun: Step 3.7 Flash stack up?
Compare side-by-side with other professional models.
Model Information
| OpenRouter ID |
stepfun/step-3.7-flash
|
| Provider | stepfun |
| Release Date | May 28, 2026 |
| Context Length | 256,000 tokens |
| Max Completion | 256,000 tokens |
| Status | Active |
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $0.20 | $0.000200 |
| Output | $1.15 | $0.001150 |
Live Performance
Live endpoint metrics, refreshed every 30 minutes.
External Resources
Explore Related Models
Frequently asked questions about StepFun: Step 3.7 Flash
How much does StepFun: Step 3.7 Flash cost?
StepFun: Step 3.7 Flash costs $0.20 per million input tokens and $1.15 per million output tokens.
What is the context window of StepFun: Step 3.7 Flash?
StepFun: Step 3.7 Flash has a context window of 256,000 tokens (256K).
Is StepFun: Step 3.7 Flash good for coding?
On our coding benchmark index, StepFun: Step 3.7 Flash ranks #53 of 314 models, placing it in the top quartile of the field for code generation and debugging.
What can StepFun: Step 3.7 Flash do?
StepFun: Step 3.7 Flash supports image/vision input, tool use, and function calling.
Who created StepFun: Step 3.7 Flash?
StepFun: Step 3.7 Flash is developed by StepFun and was released on May 28, 2026.
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: June 9, 2026 9:57 pm