StarCoder2-15B

Property	Value
Parameter Count	15 Billion
Training Data	The Stack v2 (600+ programming languages)
Context Window	16,384 tokens
License	BigCode OpenRAIL-M
Paper	Link to Paper

What is StarCoder2-15B?

StarCoder2-15B is a state-of-the-art code generation model that represents a significant advancement in AI-powered programming assistance. Trained on over 4 trillion tokens across 600+ programming languages, this model employs sophisticated architectural features including Grouped Query Attention and a sliding window attention mechanism of 4,096 tokens. The model was developed using NVIDIA's NeMo Framework and trained on the NVIDIA Eos Supercomputer.

Implementation Details

The model leverages several cutting-edge technical innovations:

Fill-in-the-Middle (FIM) training objective for improved code understanding
16,384 token context window with 4,096 token sliding window attention
Trained with bfloat16 precision on 1024 H100 GPUs
Supports multiple precision options including 8-bit and 4-bit quantization

Core Capabilities

Achieves 46.3% pass@1 on HumanEval benchmark
33.8% pass@1 on DS-1000 dataset
65.1% accuracy on GSM8K (PAL)
74.08% edit-similarity on RepoBench-v1.1
Supports both CPU and GPU inference with various optimization options

Frequently Asked Questions

Q: What makes this model unique?

StarCoder2-15B stands out for its massive scale of training data, sophisticated attention mechanisms, and strong performance across various programming languages. It's particularly notable for not being an instruction-tuned model, focusing instead on pure code generation capabilities.

Q: What are the recommended use cases?

The model excels at code generation tasks when provided with appropriate context. It's best suited for code completion, generation, and understanding tasks across hundreds of programming languages. However, it's important to note that it's not designed for natural language instructions and works best with direct code-related inputs.

starcoder2-15b