DeepCoder-14B-Preview

Property	Value
Parameter Count	14 Billion
Base Model	DeepSeek-R1-Distilled-Qwen-14B
License	MIT License
LiveCodeBench Score	60.6% Pass@1
Model URL	HuggingFace/agentica-org/DeepCoder-14B-Preview

What is DeepCoder-14B-Preview?

DeepCoder-14B-Preview represents a significant advancement in code reasoning language models, fine-tuned using distributed reinforcement learning techniques. The model achieves remarkable performance comparable to OpenAI's o3-mini while maintaining a smaller parameter count of 14B. It's particularly notable for its ability to handle long context lengths up to 64K tokens.

Implementation Details

The model employs an enhanced version of GRPO (GRPO+) combined with iterative context lengthening. The training dataset comprises 24K unique problem-tests pairs from sources including Taco-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench v5.

Implements offline difficulty filtering for stable training
Removes entropy and KL loss for improved stability
Utilizes overlong filtering for preserving long-context reasoning
Features iterative context lengthening from 16K to 32K to 64K

Core Capabilities

60.6% Pass@1 accuracy on LiveCodeBench v5
1936 Codeforces Rating (95.3 percentile)
92.6% success rate on HumanEval+
Supports context lengths up to 64K tokens
Compatible with major inference systems including vLLM, HuggingFace TGI, SGLang, and TensorRT-LLM

Frequently Asked Questions

Q: What makes this model unique?

DeepCoder-14B-Preview stands out for achieving performance comparable to proprietary models while being fully open-source. Its improved GRPO+ training methodology and context length capabilities make it particularly effective for complex coding tasks.

Q: What are the recommended use cases?

The model excels at code generation and reasoning tasks, with optimal performance achieved using a temperature of 0.6, top_p of 0.95, and max_tokens set to 64000. It's designed for direct instruction processing without requiring system prompts.