DeepCoder-1.5B-Preview

Property	Value
Base Model	DeepSeek-R1-Distilled-Qwen-1.5B
Training Approach	GRPO+ with Iterative Context Lengthening
License	MIT License
Model URL	huggingface.co/agentica-org/DeepCoder-1.5B-Preview

What is DeepCoder-1.5B-Preview?

DeepCoder-1.5B-Preview is an advanced code reasoning language model that leverages distributed reinforcement learning to enhance code generation capabilities. Fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B, it demonstrates significant improvements in coding benchmarks, achieving 25.1% on LiveCodeBench (v5) and 73.0% on HumanEval+.

Implementation Details

The model employs an enhanced version of GRPO (GRPO+) combined with iterative context lengthening. The training dataset comprises 24K unique problem-tests pairs from Taco-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench v5.

Offline Difficulty Filtering for stable training
Removal of entropy and KL loss components
Overlong Filtering for preserving long-context reasoning
Modified clip high bounds for improved exploration

Core Capabilities

Context length handling up to 64K
Codeforces Rating: 963 (28.5 percentile)
Superior performance compared to base model across multiple benchmarks
Compatible with various serving systems including vLLM, HuggingFace TGI, SGLang, and TensorRT-LLM

Frequently Asked Questions

Q: What makes this model unique?

The model's unique GRPO+ training approach and iterative context lengthening enable superior code generation capabilities while maintaining stability during training. It successfully eliminates common training issues like entropy collapse while achieving strong performance on coding benchmarks.

Q: What are the recommended use cases?

DeepCoder-1.5B-Preview is particularly suited for code generation tasks, problem-solving in competitive programming scenarios, and handling long-context coding challenges. It's ideal for developers needing assistance with complex coding tasks while working within extended context windows.