DeepScaleR-1.5B-Preview

Maintained By
agentica-org

DeepScaleR-1.5B-Preview

PropertyValue
Parameters1.5B
Base ModelDeepSeek-R1-Distilled-Qwen-1.5B
LicenseMIT
Model URLhttps://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview

What is DeepScaleR-1.5B-Preview?

DeepScaleR-1.5B-Preview represents a breakthrough in efficient language model scaling, achieving remarkable performance with just 1.5B parameters. Fine-tuned using distributed reinforcement learning, it notably surpasses OpenAI's O1-Preview on mathematical reasoning tasks, demonstrating a 43.1% Pass@1 accuracy on AIME 2024.

Implementation Details

The model employs an innovative training approach using Group Relative Policy Optimization (GRPO), extending PPO with normalized advantage functions and KL divergence regularization. The training process involves iterative context lengthening, starting from 8K and scaling up to 24K tokens, utilizing a distributed setup of A100-80GB GPUs.

  • Training dataset: 40,000 unique problem-answer pairs from AIME, AMC, Omni-MATH, and Still datasets
  • Progressive context scaling: 8K → 16K → 24K tokens
  • Binary reward function: 1 for correct answers, 0 for incorrect/improper formatting
  • Distributed training across 8-32 A100-80GB GPUs

Core Capabilities

  • 43.1% Pass@1 accuracy on AIME 2024
  • 87.8% accuracy on MATH 500
  • 73.6% accuracy on AMC 2023
  • Compatible with vLLM, HuggingFace TGI, SGLang, and TensorRT-LLM
  • Supports OpenAI Chat Completions API format

Frequently Asked Questions

Q: What makes this model unique?

DeepScaleR-1.5B-Preview achieves state-of-the-art performance on mathematical reasoning tasks with significantly fewer parameters than competitors, demonstrating the effectiveness of its innovative RL-based training approach and iterative context lengthening strategy.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, particularly in competitive mathematics contexts like AIME and AMC. It's particularly suitable for applications requiring advanced mathematical reasoning within resource-constrained environments.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.