StarCoder2-7B
Property | Value |
---|---|
Parameter Count | 7.17B |
License | BigCode OpenRAIL-M |
Paper | View Paper |
Training Data | The Stack v2 (3.5+ trillion tokens) |
Context Window | 16,384 tokens |
What is StarCoder2-7B?
StarCoder2-7B is a state-of-the-art code generation model trained on 17 programming languages from The Stack v2 dataset. It represents a significant advancement in AI-powered code generation, utilizing Grouped Query Attention and a sliding window attention mechanism of 4,096 tokens.
Implementation Details
The model employs a sophisticated architecture combining Transformer decoder with grouped-query attention and implements the Fill-in-the-Middle training objective. Training was conducted using 432 H100 GPUs, with the model processing over 3.5 trillion tokens across 1 million training steps.
- BFloat16 precision for optimal performance
- 16,384 token context window
- Sliding window attention of 4,096 tokens
- Trained using the nanotron framework
Core Capabilities
- Code generation across 17 programming languages
- Context-aware code completion
- Support for multiple deployment options (CPU/GPU)
- Quantization support (8-bit and 4-bit precision)
- Memory-efficient operation with various precision options
Frequently Asked Questions
Q: What makes this model unique?
StarCoder2-7B stands out for its extensive training on permissively licensed code and its advanced attention mechanisms, making it particularly effective for code generation tasks while maintaining reasonable computational requirements.
Q: What are the recommended use cases?
The model excels at code completion and generation tasks but is not designed for instruction-following. It's best suited for direct code generation based on context rather than natural language commands.