StableLM Zephyr 3B
Property | Value |
---|---|
Parameter Count | 2.8B |
License | StabilityAI Community License |
Architecture | Transformer Decoder |
Training Method | Direct Preference Optimization (DPO) |
Paper | DPO Paper |
What is StableLM Zephyr 3B?
StableLM Zephyr 3B is an advanced language model developed by Stability AI, designed specifically for instruction-following and conversational tasks. Built on the transformer decoder architecture, it represents a significant achievement in creating smaller yet highly capable language models, trained using Direct Preference Optimization (DPO) methodology.
Implementation Details
The model was trained on a diverse collection of datasets including UltraChat, MetaMathQA, and WizardLM instruction data. It employs a specific chat template format and achieves impressive benchmark scores, including 6.64 on MT-Bench and a 76% win rate on AlpacaEval.
- Trained across 8 nodes with 8 A100 80GB GPUs
- Utilizes BF16 precision for optimal performance
- Implements sophisticated safety measures and ethical AI principles
Core Capabilities
- Strong performance on mathematical reasoning (42.15% on GSM8k)
- High accuracy on common sense tasks (74.16% on HellaSwag)
- Effective at truthful QA (46.49% on TruthfulQA)
- Robust performance on general knowledge (46.17% on MMLU)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for achieving impressive performance metrics despite its relatively small size of 2.8B parameters, making it more accessible for deployment while maintaining strong capabilities across various tasks.
Q: What are the recommended use cases?
The model is ideal for conversational AI applications, instruction-following tasks, and general text generation. It's particularly suitable for developers looking to fine-tune for specific applications while maintaining reasonable computational requirements.