Microsoft Phi-2
Property | Value |
---|---|
Parameter Count | 2.78B |
Model Type | Transformer-based Language Model |
Training Data | 250B tokens |
License | Microsoft Research License |
Training Infrastructure | 96xA100-80G GPUs, 14 days training |
What is phi-2?
Phi-2 is Microsoft's state-of-the-art language model designed specifically for research purposes. With 2.7 billion parameters, it represents a significant achievement in creating efficient, smaller-scale models that can compete with larger counterparts in tasks requiring common sense, language understanding, and logical reasoning.
Implementation Details
Built on PyTorch and utilizing DeepSpeed with flash-attention >2.0.0, Phi-2 was trained on 1.4T tokens using a combination of NLP synthetic data created by GPT-3.5 and carefully filtered web data from Falcon RefinedWeb and SlimPajama. The training process was validated using GPT-4 to ensure quality and safety.
- Architecture: Transformer-based model with next-word prediction
- Training Infrastructure: 96 A100-80G GPUs
- Training Duration: 14 days
- Framework: PyTorch with DeepSpeed optimization
Core Capabilities
- Question-Answering with high accuracy
- Natural chat interactions
- Python code generation
- Common sense reasoning
- Language understanding tasks
Frequently Asked Questions
Q: What makes this model unique?
Phi-2 stands out for achieving near state-of-the-art performance among models under 10B parameters, without using reinforcement learning from human feedback. It's specifically designed for research purposes with a focus on safety and educational value.
Q: What are the recommended use cases?
The model is best suited for research applications in QA format, chat format, and code generation, particularly in Python. It's important to note that it's not intended for production use and should be used primarily for research purposes, especially in exploring safety challenges and model controllability.