proxy-lite-3b

Maintained By
convergence-ai

Proxy-lite-3b

PropertyValue
Model TypeVision-Language Model
Parameters3 Billion
Base ModelQwen2.5-VL-3B-Instruct
LicenseCC-BY-NC-4.0
DeveloperConvergence AI

What is proxy-lite-3b?

Proxy-lite-3b is a specialized vision-language model designed for automated web browsing tasks. As a lightweight version of Proxy, it combines visual understanding with language processing capabilities to navigate and interact with web interfaces effectively. The model achieved an impressive 72.4% success rate on the WebVoyager benchmark, leading all open-weights models in this category.

Implementation Details

The model is built on the Qwen2.5-VL-3B-Instruct architecture and implements a sophisticated context-window management system that preserves task awareness while optimizing image token usage. It can be deployed using vLLM and includes specialized tool-parsing capabilities for web interaction.

  • Supports both CLI and Streamlit interface implementations
  • Features automatic tool choice capabilities
  • Implements custom message history formatting for effective task tracking
  • Uses OpenAI-compatible serialization for tool management

Core Capabilities

  • Automated web navigation and interaction
  • High performance across various websites (87.8% success on Allrecipes, 85% on GitHub)
  • Visual-textual understanding of web interfaces
  • Efficient context management for long-running tasks
  • Integration with browser automation tools

Frequently Asked Questions

Q: What makes this model unique?

Proxy-lite-3b stands out for its specialized web automation capabilities while maintaining a relatively small parameter count. Its performance on the WebVoyager benchmark demonstrates its effectiveness in real-world web interaction tasks, making it particularly suitable for automated browsing applications.

Q: What are the recommended use cases?

The model is ideal for automated web navigation, content discovery, and routine web-based tasks. However, it should not be used for high-stakes applications, unauthorized data extraction, or interactions with untrusted websites. It's particularly effective for tasks like market research, content aggregation, and web interface testing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.