StarCoder2-15B GPTQ

Property	Value
Parameter Count	15 Billion
License	BigCode OpenRAIL-M
Context Window	16,384 tokens
Paper	View Paper
Training Data	The Stack v2 (600+ programming languages)

What is StarCoder2-15B GPTQ?

StarCoder2-15B GPTQ is a quantized version of the powerful StarCoder2 code generation model, designed to deliver efficient performance while maintaining high-quality code generation capabilities. Trained on over 4 trillion tokens across 600+ programming languages, this model represents a significant advancement in AI-powered code generation.

Implementation Details

The model utilizes advanced architectural features including Grouped Query Attention and a sliding window attention mechanism of 4,096 tokens within its 16,384 token context window. It was trained using the Fill-in-the-Middle objective on NVIDIA's Eos Supercomputer with 1024 H100 GPUs.

GPTQ quantization for improved efficiency
Built on the NVIDIA NeMo™ Framework
Supports both 4-bit and 8-bit precision options
Implements sliding window attention for better long-range dependencies

Core Capabilities

Code generation across 600+ programming languages
Extended context window of 16,384 tokens
Efficient memory usage through quantization
Support for multiple deployment options (CPU/GPU/multi-GPU)

Frequently Asked Questions

Q: What makes this model unique?

The model combines massive scale (15B parameters) with efficient quantization, making it practical for deployment while maintaining high performance on code generation tasks. Its training on The Stack v2 dataset and implementation of advanced attention mechanisms set it apart from traditional code models.

Q: What are the recommended use cases?

The model excels at code generation tasks but is not designed as an instruction-following model. It's best suited for code completion, generation, and understanding tasks when provided with appropriate context rather than natural language commands.