Phi-1
Property | Value |
---|---|
Parameter Count | 1.3B |
Training Tokens | 54B |
License | MIT |
Paper | Textbooks Are All You Need |
Training Infrastructure | 8 A100 GPUs, 6 days training |
What is phi-1?
Phi-1 is Microsoft's specialized Python coding language model, designed with a focused 1.3B parameter architecture. Despite its relatively compact size, it achieves impressive results with over 50% accuracy on the HumanEval Python coding benchmark. The model was trained on a carefully curated mix of Python code from The Stack v1.2, StackOverflow Q&A, competition code, and synthetic Python textbooks.
Implementation Details
The model utilizes a Transformer-based architecture trained with FP16 precision, incorporating technologies like PyTorch, DeepSpeed, and Flash-Attention. It processes code generation tasks using next-word prediction and was trained on 54B tokens over 6 days using 8 A100 GPUs.
- Integrated with transformers library (v4.37.0+)
- Specialized for Python code generation
- Optimized for basic coding tasks and documentation
Core Capabilities
- Python code generation from docstring descriptions
- Understanding and completing Python functions
- Basic package integration (typing, math, random, collections, datetime, itertools)
- Code documentation and explanation
Frequently Asked Questions
Q: What makes this model unique?
Phi-1 stands out for achieving high performance on Python coding tasks despite its relatively small size (1.3B parameters). It demonstrates that carefully curated training data and specialized focus can lead to impressive results without requiring massive model scale.
Q: What are the recommended use cases?
The model is best suited for Python coding tasks, particularly those involving basic programming concepts and standard library packages. It excels at generating code from docstrings and completing Python functions, though users should review generated code for accuracy and security.