Microsoft Phi-2

Property	Value
Parameter Count	2.78B
Model Type	Transformer-based Language Model
Training Data	250B tokens
License	Microsoft Research License
Training Infrastructure	96xA100-80G GPUs, 14 days training

What is phi-2?

Phi-2 is Microsoft's state-of-the-art language model designed specifically for research purposes. With 2.7 billion parameters, it represents a significant achievement in creating efficient, smaller-scale models that can compete with larger counterparts in tasks requiring common sense, language understanding, and logical reasoning.

Implementation Details

Built on PyTorch and utilizing DeepSpeed with flash-attention >2.0.0, Phi-2 was trained on 1.4T tokens using a combination of NLP synthetic data created by GPT-3.5 and carefully filtered web data from Falcon RefinedWeb and SlimPajama. The training process was validated using GPT-4 to ensure quality and safety.

Architecture: Transformer-based model with next-word prediction
Training Infrastructure: 96 A100-80G GPUs
Training Duration: 14 days
Framework: PyTorch with DeepSpeed optimization

Core Capabilities

Question-Answering with high accuracy
Natural chat interactions
Python code generation
Common sense reasoning
Language understanding tasks

Frequently Asked Questions

Q: What makes this model unique?

Phi-2 stands out for achieving near state-of-the-art performance among models under 10B parameters, without using reinforcement learning from human feedback. It's specifically designed for research purposes with a focus on safety and educational value.

Q: What are the recommended use cases?

The model is best suited for research applications in QA format, chat format, and code generation, particularly in Python. It's important to note that it's not intended for production use and should be used primarily for research purposes, especially in exploring safety challenges and model controllability.

phi-2