Llama 3.3 70B Instruct AWQ
Property | Value |
---|---|
Parameter Count | 70 Billion |
Context Length | 128,000 tokens |
Training Tokens | 15T+ |
License | Llama 3.3 Community License Agreement |
Release Date | December 6, 2024 |
Supported Languages | English, German, French, Italian, Portuguese, Hindi, Spanish, Thai |
What is llama-3.3-70b-instruct-awq?
Llama 3.3 70B Instruct AWQ is Meta's advanced multilingual language model that has been quantized using AutoAWQ for improved efficiency. This version maintains the powerful capabilities of the original model while reducing its computational requirements. The model excels in both dialogue and instruction-following tasks, demonstrating superior performance across various benchmarks.
Implementation Details
The model utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) for enhanced inference scalability. It has been trained on a diverse mix of publicly available online data and fine-tuned using both supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to ensure alignment with human preferences.
- Advanced AWQ quantization for efficient deployment
- 128k token context window
- Multilingual support for 8 languages
- Grouped-Query Attention architecture
- 15T+ training tokens
Core Capabilities
- Strong performance in MMLLU (86.0% accuracy)
- Exceptional code generation (88.4% pass@1 on HumanEval)
- Advanced mathematical reasoning (77.0% on MATH CoT)
- High multilingual proficiency (91.1% on MGSM)
- Improved steerability (92.1% on IFEval)
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its combination of large-scale parameters (70B), extensive context length (128k), and efficient AWQ quantization. It shows remarkable improvements in code generation and mathematical reasoning compared to previous versions, while maintaining strong multilingual capabilities.
Q: What are the recommended use cases?
The model is particularly well-suited for multilingual applications, complex coding tasks, mathematical problem-solving, and general dialogue applications. Its extensive context length makes it ideal for processing and analyzing longer documents, while its instruction-following capabilities make it valuable for task-specific applications.