StableLM Zephyr 3B

Property	Value
Parameter Count	2.8B
License	StabilityAI Community License
Architecture	Transformer Decoder
Training Method	Direct Preference Optimization (DPO)
Paper	DPO Paper

What is StableLM Zephyr 3B?

StableLM Zephyr 3B is an advanced language model developed by Stability AI, designed specifically for instruction-following and conversational tasks. Built on the transformer decoder architecture, it represents a significant achievement in creating smaller yet highly capable language models, trained using Direct Preference Optimization (DPO) methodology.

Implementation Details

The model was trained on a diverse collection of datasets including UltraChat, MetaMathQA, and WizardLM instruction data. It employs a specific chat template format and achieves impressive benchmark scores, including 6.64 on MT-Bench and a 76% win rate on AlpacaEval.

Trained across 8 nodes with 8 A100 80GB GPUs
Utilizes BF16 precision for optimal performance
Implements sophisticated safety measures and ethical AI principles

Core Capabilities

Strong performance on mathematical reasoning (42.15% on GSM8k)
High accuracy on common sense tasks (74.16% on HellaSwag)
Effective at truthful QA (46.49% on TruthfulQA)
Robust performance on general knowledge (46.17% on MMLU)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for achieving impressive performance metrics despite its relatively small size of 2.8B parameters, making it more accessible for deployment while maintaining strong capabilities across various tasks.

Q: What are the recommended use cases?

The model is ideal for conversational AI applications, instruction-following tasks, and general text generation. It's particularly suitable for developers looking to fine-tune for specific applications while maintaining reasonable computational requirements.