Refact-1.6B-fim
Property | Value |
---|---|
Parameter Count | 1.6B |
Training Tokens | 1.2T pretraining + 40B finetuning |
Context Length | 4096 tokens |
License | BigScience OpenRAIL-M |
Architecture | LLAMA-like with multi-query attention |
What is Refact-1_6B-fim?
Refact-1.6B-fim is a powerful code completion model that demonstrates exceptional performance despite its relatively modest size. With a 32% pass@1 rate on HumanEval, it outperforms larger models like Replit-3B and matches the capabilities of models nearly 10 times its size. The model specializes in Fill-in-the-Middle (FIM) completion and includes chat functionality.
Implementation Details
The model employs several innovative architectural choices including ALiBi-based attention, LayerNorm instead of RMSNorm, and Multi Query Attention. It was trained on a carefully curated dataset with a 50:50 split between code and text, focusing exclusively on English language content and computer science-related topics.
- Trained on 64 NVIDIA A5000 GPUs over 28 days
- Uses bfloat16 precision for efficient inference
- Implements flash attention and early dropout for better performance
- Supports multiple programming languages
Core Capabilities
- Code completion with Fill-in-the-Middle functionality
- Multi-language support with strong performance across Python, JavaScript, Java, and more
- Chat-based instruction following with 38.4% pass@1 on HumanEval
- 4096 token context window for handling larger code segments
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to achieve performance comparable to much larger models while maintaining a relatively small size of 1.6B parameters makes it particularly practical for IDE integration and real-world applications. Its combination of FIM and chat capabilities in a single model is also notable.
Q: What are the recommended use cases?
The model excels at code completion in IDEs, multi-language programming support, and can be used for code-related chat interactions. It's particularly effective for real-time code suggestions due to its efficient size and strong performance.