ByteDance: UI-TARS 7B

ByteDance: UI-TARS 7B

bytedance · Released Jul 22, 2025
30
Our Score

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

$0.10 / 1M Input Price
$0.20 / 1M Output Price
128,000 tokens Context Window
2,048 tokens Max Output
7B Parameters

Capabilities

Vision

Architecture

ModalityText + Image → Text
TokenizerOther
Parameters7B

Model Information

OpenRouter ID bytedance/ui-tars-1.5-7b
Providerbytedance
Release Date July 22, 2025
Context Length128,000 tokens
Max Completion2,048 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $0.10 $0.000100
Output $0.20 $0.000200

Live Performance

Live endpoint metrics — refreshed every 30 minutes.

100%
Avg Uptime
966ms
Best Latency (TTFT)
16 tok/s
Best Throughput
1/1
Active Endpoints
Available via: Parasail