StarCoder2-3B
Property | Value |
---|---|
Parameter Count | 3.03B |
License | BigCode OpenRAIL-M |
Paper | Link to Paper |
Training Data | The Stack v2 (17 programming languages) |
Context Window | 16,384 tokens |
What is StarCoder2-3B?
StarCoder2-3B is a state-of-the-art code generation model trained on over 3 trillion tokens from The Stack v2 dataset. It represents a significant advancement in AI-powered code generation, utilizing Grouped Query Attention and a sliding window attention mechanism of 4,096 tokens within its 16,384 token context window.
Implementation Details
The model leverages advanced architectural features including Fill-in-the-Middle objective training and was trained using bfloat16 precision on 160 A100 GPUs. It supports multiple deployment options, from full precision to 4-bit quantization for efficient inference.
- Transformer decoder architecture with grouped-query attention
- Trained for 1.2 million steps on filtered, permissively licensed code
- Supports multiple precision options (FP32, BF16, 8-bit, 4-bit)
- Memory-efficient deployment options available
Core Capabilities
- Code completion and generation across 17 programming languages
- Long context understanding with 16K token window
- Resource-efficient inference through various quantization options
- Built-in support for attribution tracking
Frequently Asked Questions
Q: What makes this model unique?
StarCoder2-3B combines advanced attention mechanisms with a large context window and efficient architecture, making it particularly effective for code generation while maintaining reasonable resource requirements.
Q: What are the recommended use cases?
The model excels at code completion and generation tasks but is not designed for instruction-following. It's best used for direct code generation with appropriate context rather than natural language commands.