Phi-1

Property	Value
Parameter Count	1.3B
Training Tokens	54B
License	MIT
Paper	Textbooks Are All You Need
Training Infrastructure	8 A100 GPUs, 6 days training

What is phi-1?

Phi-1 is Microsoft's specialized Python coding language model, designed with a focused 1.3B parameter architecture. Despite its relatively compact size, it achieves impressive results with over 50% accuracy on the HumanEval Python coding benchmark. The model was trained on a carefully curated mix of Python code from The Stack v1.2, StackOverflow Q&A, competition code, and synthetic Python textbooks.

Implementation Details

The model utilizes a Transformer-based architecture trained with FP16 precision, incorporating technologies like PyTorch, DeepSpeed, and Flash-Attention. It processes code generation tasks using next-word prediction and was trained on 54B tokens over 6 days using 8 A100 GPUs.

Integrated with transformers library (v4.37.0+)
Specialized for Python code generation
Optimized for basic coding tasks and documentation

Core Capabilities

Python code generation from docstring descriptions
Understanding and completing Python functions
Basic package integration (typing, math, random, collections, datetime, itertools)
Code documentation and explanation

Frequently Asked Questions

Q: What makes this model unique?

Phi-1 stands out for achieving high performance on Python coding tasks despite its relatively small size (1.3B parameters). It demonstrates that carefully curated training data and specialized focus can lead to impressive results without requiring massive model scale.

Q: What are the recommended use cases?

The model is best suited for Python coding tasks, particularly those involving basic programming concepts and standard library packages. It excels at generating code from docstrings and completing Python functions, though users should review generated code for accuracy and security.

phi-1