proxy-lite-3b

proxy-lite-3b

convergence-ai

A 3B parameter vision-language model designed for web browsing automation, built on Qwen2.5-VL-3B-Instruct with 72.4% accuracy on WebVoyager benchmark.

PropertyValue
Model TypeVision-Language Model
Parameters3 Billion
Base ModelQwen2.5-VL-3B-Instruct
LicenseCC-BY-NC-4.0
DeveloperConvergence AI

What is proxy-lite-3b?

Proxy-lite-3b is a specialized vision-language model designed for automated web browsing tasks. As a lightweight version of Proxy, it combines visual understanding with language processing capabilities to navigate and interact with web interfaces effectively. The model achieved an impressive 72.4% success rate on the WebVoyager benchmark, leading all open-weights models in this category.

Implementation Details

The model is built on the Qwen2.5-VL-3B-Instruct architecture and implements a sophisticated context-window management system that preserves task awareness while optimizing image token usage. It can be deployed using vLLM and includes specialized tool-parsing capabilities for web interaction.

  • Supports both CLI and Streamlit interface implementations
  • Features automatic tool choice capabilities
  • Implements custom message history formatting for effective task tracking
  • Uses OpenAI-compatible serialization for tool management

Core Capabilities

  • Automated web navigation and interaction
  • High performance across various websites (87.8% success on Allrecipes, 85% on GitHub)
  • Visual-textual understanding of web interfaces
  • Efficient context management for long-running tasks
  • Integration with browser automation tools

Frequently Asked Questions

Q: What makes this model unique?

Proxy-lite-3b stands out for its specialized web automation capabilities while maintaining a relatively small parameter count. Its performance on the WebVoyager benchmark demonstrates its effectiveness in real-world web interaction tasks, making it particularly suitable for automated browsing applications.

Q: What are the recommended use cases?

The model is ideal for automated web navigation, content discovery, and routine web-based tasks. However, it should not be used for high-stakes applications, unauthorized data extraction, or interactions with untrusted websites. It's particularly effective for tasks like market research, content aggregation, and web interface testing.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026