Phi-1.5
Property | Value |
---|---|
Parameter Count | 1.3 billion |
Training Tokens | 150B |
License | MIT |
Paper | View Paper |
Training Infrastructure | 32xA100-40G GPUs |
What is phi-1_5?
Phi-1.5 is Microsoft's advanced language model that represents a significant advancement in compact yet powerful AI models. Built as a Transformer architecture with 1.3 billion parameters, it achieves near state-of-the-art performance among sub-10B parameter models in common sense, language understanding, and logical reasoning tasks. Notably, it was trained without instruction fine-tuning or reinforcement learning from human feedback (RLHF), making it an ideal candidate for safety research.
Implementation Details
The model was trained on 30B tokens of carefully curated data, expanded over 150B tokens during training. It employs FP16 precision and utilizes advanced technologies like DeepSpeed and Flash-Attention for optimal performance. Training was completed in 8 days using 32 A100-40G GPUs.
- Trained using PyTorch and DeepSpeed frameworks
- Implements Flash-Attention for improved efficiency
- Excludes generic web-crawl data for enhanced safety
- Optimized for QA, chat, and code generation formats
Core Capabilities
- Poetry and creative writing
- Email drafting and text summarization
- Python code generation
- Story creation and text completion
- Question-answering in structured formats
Frequently Asked Questions
Q: What makes this model unique?
Phi-1.5's uniqueness lies in its impressive performance despite its relatively small size, and its training approach that excludes web-crawl data and RLHF, making it valuable for safety research.
Q: What are the recommended use cases?
The model excels in QA format interactions, chat-style conversations, and code generation tasks. It's particularly suitable for research purposes and as a starting point for exploring AI safety challenges.