DeepScaleR-1.5B-Preview
Property | Value |
---|---|
Parameters | 1.5B |
Base Model | DeepSeek-R1-Distilled-Qwen-1.5B |
License | MIT |
Model URL | https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview |
What is DeepScaleR-1.5B-Preview?
DeepScaleR-1.5B-Preview represents a breakthrough in efficient language model scaling, achieving remarkable performance with just 1.5B parameters. Fine-tuned using distributed reinforcement learning, it notably surpasses OpenAI's O1-Preview on mathematical reasoning tasks, demonstrating a 43.1% Pass@1 accuracy on AIME 2024.
Implementation Details
The model employs an innovative training approach using Group Relative Policy Optimization (GRPO), extending PPO with normalized advantage functions and KL divergence regularization. The training process involves iterative context lengthening, starting from 8K and scaling up to 24K tokens, utilizing a distributed setup of A100-80GB GPUs.
- Training dataset: 40,000 unique problem-answer pairs from AIME, AMC, Omni-MATH, and Still datasets
- Progressive context scaling: 8K → 16K → 24K tokens
- Binary reward function: 1 for correct answers, 0 for incorrect/improper formatting
- Distributed training across 8-32 A100-80GB GPUs
Core Capabilities
- 43.1% Pass@1 accuracy on AIME 2024
- 87.8% accuracy on MATH 500
- 73.6% accuracy on AMC 2023
- Compatible with vLLM, HuggingFace TGI, SGLang, and TensorRT-LLM
- Supports OpenAI Chat Completions API format
Frequently Asked Questions
Q: What makes this model unique?
DeepScaleR-1.5B-Preview achieves state-of-the-art performance on mathematical reasoning tasks with significantly fewer parameters than competitors, demonstrating the effectiveness of its innovative RL-based training approach and iterative context lengthening strategy.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving, particularly in competitive mathematics contexts like AIME and AMC. It's particularly suitable for applications requiring advanced mathematical reasoning within resource-constrained environments.