UI-TARS-72B-DPO

Property	Value
Model Size	72B parameters
Paper	arXiv:2501.12326
Repository	https://github.com/bytedance/UI-TARS
Author	ByteDance Research

What is UI-TARS-72B-DPO?

UI-TARS-72B-DPO is a groundbreaking native GUI agent model that represents the next generation of automated interface interaction. This model uniquely integrates perception, reasoning, grounding, and memory capabilities within a single vision-language model, enabling end-to-end task automation without predefined workflows or manual rules.

Implementation Details

The model implements a comprehensive architecture that allows it to understand and interact with graphical user interfaces naturally. It demonstrates exceptional performance across multiple benchmarks, including VisualWebBench (82.8%), WebSRC (89.3%), and SQAshort (88.6%).

Integrated perception and reasoning capabilities
End-to-end task automation
Superior performance in GUI interaction tasks
Advanced grounding capabilities across different interface types

Core Capabilities

Exceptional performance in mobile interface interaction (94.8% accuracy)
Strong desktop environment handling (91.2% accuracy)
Web interface manipulation (91.5% accuracy)
Cross-domain task execution with high success rates (62.1% SR)
Robust element recognition and operation execution

Frequently Asked Questions

Q: What makes this model unique?

UI-TARS-72B-DPO stands out for its unified approach to GUI interaction, combining all essential components in a single model rather than using traditional modular frameworks. It achieves state-of-the-art performance across various benchmarks and can handle complex GUI interactions without predefined rules.

Q: What are the recommended use cases?

The model excels in automated GUI testing, user interface interaction automation, cross-platform task execution, and general purpose interface manipulation across mobile, desktop, and web platforms. It's particularly effective for complex workflows requiring both visual understanding and logical reasoning.

UI-TARS-72B-DPO

UI-TARS-72B-DPO

What is UI-TARS-72B-DPO?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models