DeepScaleR-1.5B-Preview

Property	Value
Parameters	1.5B
Base Model	DeepSeek-R1-Distilled-Qwen-1.5B
License	MIT
Model URL	https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview

What is DeepScaleR-1.5B-Preview?

DeepScaleR-1.5B-Preview represents a breakthrough in efficient language model scaling, achieving remarkable performance with just 1.5B parameters. Fine-tuned using distributed reinforcement learning, it notably surpasses OpenAI's O1-Preview on mathematical reasoning tasks, demonstrating a 43.1% Pass@1 accuracy on AIME 2024.

Implementation Details

The model employs an innovative training approach using Group Relative Policy Optimization (GRPO), extending PPO with normalized advantage functions and KL divergence regularization. The training process involves iterative context lengthening, starting from 8K and scaling up to 24K tokens, utilizing a distributed setup of A100-80GB GPUs.

Training dataset: 40,000 unique problem-answer pairs from AIME, AMC, Omni-MATH, and Still datasets
Progressive context scaling: 8K → 16K → 24K tokens
Binary reward function: 1 for correct answers, 0 for incorrect/improper formatting
Distributed training across 8-32 A100-80GB GPUs

Core Capabilities

43.1% Pass@1 accuracy on AIME 2024
87.8% accuracy on MATH 500
73.6% accuracy on AMC 2023
Compatible with vLLM, HuggingFace TGI, SGLang, and TensorRT-LLM
Supports OpenAI Chat Completions API format

Frequently Asked Questions

Q: What makes this model unique?

DeepScaleR-1.5B-Preview achieves state-of-the-art performance on mathematical reasoning tasks with significantly fewer parameters than competitors, demonstrating the effectiveness of its innovative RL-based training approach and iterative context lengthening strategy.

Q: What are the recommended use cases?

The model excels in mathematical problem-solving, particularly in competitive mathematics contexts like AIME and AMC. It's particularly suitable for applications requiring advanced mathematical reasoning within resource-constrained environments.