Llama-3-8B-Web
Property | Value |
---|---|
Parameter Count | 8.03B |
Model Type | LLaMA 3 |
License | Meta Llama 3 Community License |
Paper | arXiv:2402.05930 |
Tensor Type | BF16 |
What is Llama-3-8B-Web?
Llama-3-8B-Web is a specialized language model fine-tuned for web navigation tasks, built upon Meta's Llama-3-8B-Instruct model. Developed by McGill-NLP, this model is specifically designed to assist users in browsing the web, achieving remarkable performance that surpasses GPT-4V by 18% on the WebLINX benchmark.
Implementation Details
The model is trained on the WebLINX dataset, utilizing a carefully curated subset of 24K instances of web navigation and dialogue interactions. It's implemented using the Transformers library and can be easily integrated with various web automation platforms like Playwright and BrowserGym.
- Fine-tuned on validated web interaction data
- Supports multiple action types: click, textinput, submit, and dialogue
- Optimized for real-world website navigation
- Compatible with major web automation frameworks
Core Capabilities
- 34.1% segment F1 score for link selection (vs GPT-4V's 18.9%)
- 27.1% IoU for element clicking accuracy (vs GPT-4V's 13.6%)
- 37.5% chr-F1 for response alignment (vs GPT-4V's 3.1%)
- Effective performance across 150 different websites
- Handles complex tasks including booking, shopping, and spreadsheet manipulation
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for web navigation tasks, demonstrating superior performance in understanding and interacting with web interfaces compared to larger models like GPT-4V, while maintaining a relatively compact size of 8B parameters.
Q: What are the recommended use cases?
The model is ideal for building web browsing assistants, automated testing systems, and user interaction simulation. It excels in tasks like form filling, navigation, and dialogue-based web interactions across various domains including e-commerce, booking systems, and content management.