Proxy-lite-3b
Property | Value |
---|---|
Model Type | Vision-Language Model |
Parameters | 3 Billion |
Base Model | Qwen2.5-VL-3B-Instruct |
License | CC-BY-NC-4.0 |
Developer | Convergence AI |
What is proxy-lite-3b?
Proxy-lite-3b is a specialized vision-language model designed for automated web browsing tasks. As a lightweight version of Proxy, it combines visual understanding with language processing capabilities to navigate and interact with web interfaces effectively. The model achieved an impressive 72.4% success rate on the WebVoyager benchmark, leading all open-weights models in this category.
Implementation Details
The model is built on the Qwen2.5-VL-3B-Instruct architecture and implements a sophisticated context-window management system that preserves task awareness while optimizing image token usage. It can be deployed using vLLM and includes specialized tool-parsing capabilities for web interaction.
- Supports both CLI and Streamlit interface implementations
- Features automatic tool choice capabilities
- Implements custom message history formatting for effective task tracking
- Uses OpenAI-compatible serialization for tool management
Core Capabilities
- Automated web navigation and interaction
- High performance across various websites (87.8% success on Allrecipes, 85% on GitHub)
- Visual-textual understanding of web interfaces
- Efficient context management for long-running tasks
- Integration with browser automation tools
Frequently Asked Questions
Q: What makes this model unique?
Proxy-lite-3b stands out for its specialized web automation capabilities while maintaining a relatively small parameter count. Its performance on the WebVoyager benchmark demonstrates its effectiveness in real-world web interaction tasks, making it particularly suitable for automated browsing applications.
Q: What are the recommended use cases?
The model is ideal for automated web navigation, content discovery, and routine web-based tasks. However, it should not be used for high-stakes applications, unauthorized data extraction, or interactions with untrusted websites. It's particularly effective for tasks like market research, content aggregation, and web interface testing.