ShowUI-2B

Property	Value
Parameter Count	2.21B
Model Type	Vision-Language-Action
Base Model	Qwen2-VL-2B-Instruct
Paper	arXiv:2411.17465
Tensor Type	BF16

What is ShowUI-2B?

ShowUI-2B is a lightweight vision-language-action model specifically designed for GUI agents. Built on the Qwen2-VL architecture, it enables sophisticated interaction with computer interfaces through visual understanding and action generation. The model represents a significant advancement in AI-driven interface manipulation, capable of understanding screen contents and executing precise actions.

Implementation Details

The model is implemented using PyTorch and utilizes Safetensors for efficient parameter storage. It features a sophisticated architecture that processes both visual and textual inputs to generate appropriate interface actions.

Built on Qwen2-VL-2B-Instruct architecture
Supports multiple action types including CLICK, INPUT, SELECT, HOVER, and more
Processes images with flexible pixel requirements (256x28x28 to 1344x28x28)
Implements coordinate-based interface interaction

Core Capabilities

UI Grounding: Precise element location identification
Multi-modal Navigation: Combines vision and language for interface navigation
Action Generation: Produces contextually appropriate interface actions
Cross-platform Support: Works with both web and mobile interfaces
Coordinate System: Uses relative coordinates (0-1 scale) for precise positioning

Frequently Asked Questions

Q: What makes this model unique?

ShowUI-2B stands out for its specialized focus on GUI interaction, combining vision-language understanding with precise action generation in a lightweight 2B parameter package. Its ability to process screen contents and generate coordinate-based actions makes it particularly suitable for automated interface interaction.

Q: What are the recommended use cases?

The model is ideal for automated GUI testing, interface navigation assistance, and developing AI-powered interface agents. It's particularly useful for web automation, mobile app testing, and creating assistive technology for interface interaction.

ShowUI-2B

ShowUI-2B

What is ShowUI-2B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models