Phi-1.5

Property	Value
Parameter Count	1.3 billion
Training Tokens	150B
License	MIT
Paper	View Paper
Training Infrastructure	32xA100-40G GPUs

What is phi-1_5?

Phi-1.5 is Microsoft's advanced language model that represents a significant advancement in compact yet powerful AI models. Built as a Transformer architecture with 1.3 billion parameters, it achieves near state-of-the-art performance among sub-10B parameter models in common sense, language understanding, and logical reasoning tasks. Notably, it was trained without instruction fine-tuning or reinforcement learning from human feedback (RLHF), making it an ideal candidate for safety research.

Implementation Details

The model was trained on 30B tokens of carefully curated data, expanded over 150B tokens during training. It employs FP16 precision and utilizes advanced technologies like DeepSpeed and Flash-Attention for optimal performance. Training was completed in 8 days using 32 A100-40G GPUs.

Trained using PyTorch and DeepSpeed frameworks
Implements Flash-Attention for improved efficiency
Excludes generic web-crawl data for enhanced safety
Optimized for QA, chat, and code generation formats

Core Capabilities

Poetry and creative writing
Email drafting and text summarization
Python code generation
Story creation and text completion
Question-answering in structured formats

Frequently Asked Questions

Q: What makes this model unique?

Phi-1.5's uniqueness lies in its impressive performance despite its relatively small size, and its training approach that excludes web-crawl data and RLHF, making it valuable for safety research.

Q: What are the recommended use cases?

The model excels in QA format interactions, chat-style conversations, and code generation tasks. It's particularly suitable for research purposes and as a starting point for exploring AI safety challenges.

phi-1_5