StarCoder2-15B
Property | Value |
---|---|
Parameter Count | 15 Billion |
Training Data | The Stack v2 (600+ programming languages) |
Context Window | 16,384 tokens |
License | BigCode OpenRAIL-M |
Paper | Link to Paper |
What is StarCoder2-15B?
StarCoder2-15B is a state-of-the-art code generation model that represents a significant advancement in AI-powered programming assistance. Trained on over 4 trillion tokens across 600+ programming languages, this model employs sophisticated architectural features including Grouped Query Attention and a sliding window attention mechanism of 4,096 tokens. The model was developed using NVIDIA's NeMo Framework and trained on the NVIDIA Eos Supercomputer.
Implementation Details
The model leverages several cutting-edge technical innovations:
- Fill-in-the-Middle (FIM) training objective for improved code understanding
- 16,384 token context window with 4,096 token sliding window attention
- Trained with bfloat16 precision on 1024 H100 GPUs
- Supports multiple precision options including 8-bit and 4-bit quantization
Core Capabilities
- Achieves 46.3% pass@1 on HumanEval benchmark
- 33.8% pass@1 on DS-1000 dataset
- 65.1% accuracy on GSM8K (PAL)
- 74.08% edit-similarity on RepoBench-v1.1
- Supports both CPU and GPU inference with various optimization options
Frequently Asked Questions
Q: What makes this model unique?
StarCoder2-15B stands out for its massive scale of training data, sophisticated attention mechanisms, and strong performance across various programming languages. It's particularly notable for not being an instruction-tuned model, focusing instead on pure code generation capabilities.
Q: What are the recommended use cases?
The model excels at code generation tasks when provided with appropriate context. It's best suited for code completion, generation, and understanding tasks across hundreds of programming languages. However, it's important to note that it's not designed for natural language instructions and works best with direct code-related inputs.