DeepCoder-1.5B-Preview
Property | Value |
---|---|
Base Model | DeepSeek-R1-Distilled-Qwen-1.5B |
Training Approach | GRPO+ with Iterative Context Lengthening |
License | MIT License |
Model URL | huggingface.co/agentica-org/DeepCoder-1.5B-Preview |
What is DeepCoder-1.5B-Preview?
DeepCoder-1.5B-Preview is an advanced code reasoning language model that leverages distributed reinforcement learning to enhance code generation capabilities. Fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B, it demonstrates significant improvements in coding benchmarks, achieving 25.1% on LiveCodeBench (v5) and 73.0% on HumanEval+.
Implementation Details
The model employs an enhanced version of GRPO (GRPO+) combined with iterative context lengthening. The training dataset comprises 24K unique problem-tests pairs from Taco-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench v5.
- Offline Difficulty Filtering for stable training
- Removal of entropy and KL loss components
- Overlong Filtering for preserving long-context reasoning
- Modified clip high bounds for improved exploration
Core Capabilities
- Context length handling up to 64K
- Codeforces Rating: 963 (28.5 percentile)
- Superior performance compared to base model across multiple benchmarks
- Compatible with various serving systems including vLLM, HuggingFace TGI, SGLang, and TensorRT-LLM
Frequently Asked Questions
Q: What makes this model unique?
The model's unique GRPO+ training approach and iterative context lengthening enable superior code generation capabilities while maintaining stability during training. It successfully eliminates common training issues like entropy collapse while achieving strong performance on coding benchmarks.
Q: What are the recommended use cases?
DeepCoder-1.5B-Preview is particularly suited for code generation tasks, problem-solving in competitive programming scenarios, and handling long-context coding challenges. It's ideal for developers needing assistance with complex coding tasks while working within extended context windows.