ShowUI-2B

ShowUI-2B

showlab

ShowUI-2B is a 2.21B parameter vision-language-action model specialized for GUI agents, built on Qwen2-VL architecture for computer interface interaction.

PropertyValue
Parameter Count2.21B
Model TypeVision-Language-Action
Base ModelQwen2-VL-2B-Instruct
PaperarXiv:2411.17465
Tensor TypeBF16

What is ShowUI-2B?

ShowUI-2B is a lightweight vision-language-action model specifically designed for GUI agents. Built on the Qwen2-VL architecture, it enables sophisticated interaction with computer interfaces through visual understanding and action generation. The model represents a significant advancement in AI-driven interface manipulation, capable of understanding screen contents and executing precise actions.

Implementation Details

The model is implemented using PyTorch and utilizes Safetensors for efficient parameter storage. It features a sophisticated architecture that processes both visual and textual inputs to generate appropriate interface actions.

  • Built on Qwen2-VL-2B-Instruct architecture
  • Supports multiple action types including CLICK, INPUT, SELECT, HOVER, and more
  • Processes images with flexible pixel requirements (256x28x28 to 1344x28x28)
  • Implements coordinate-based interface interaction

Core Capabilities

  • UI Grounding: Precise element location identification
  • Multi-modal Navigation: Combines vision and language for interface navigation
  • Action Generation: Produces contextually appropriate interface actions
  • Cross-platform Support: Works with both web and mobile interfaces
  • Coordinate System: Uses relative coordinates (0-1 scale) for precise positioning

Frequently Asked Questions

Q: What makes this model unique?

ShowUI-2B stands out for its specialized focus on GUI interaction, combining vision-language understanding with precise action generation in a lightweight 2B parameter package. Its ability to process screen contents and generate coordinate-based actions makes it particularly suitable for automated interface interaction.

Q: What are the recommended use cases?

The model is ideal for automated GUI testing, interface navigation assistance, and developing AI-powered interface agents. It's particularly useful for web automation, mobile app testing, and creating assistive technology for interface interaction.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026