OREAL-DeepSeek-R1-Distill-Qwen-7B

Maintained By
internlm

OREAL-DeepSeek-R1-Distill-Qwen-7B

PropertyValue
Parameter Count7 Billion
Model TypeMathematical Reasoning Model
Authorinternlm
PaperarXiv:2502.06781
Model LinkHugging Face

What is OREAL-DeepSeek-R1-Distill-Qwen-7B?

OREAL-DeepSeek-R1-Distill-Qwen-7B is a state-of-the-art mathematical reasoning model that leverages the Outcome REwArd-based reinforcement Learning (OREAL) framework. This model achieves remarkable 94.0% pass@1 accuracy on MATH-500, matching the performance of previous 32B models while using significantly fewer parameters.

Implementation Details

The model implements a novel RL framework designed specifically for tasks with binary outcome rewards. It utilizes best-of-N (BoN) sampling for behavior cloning and incorporates an on-policy token-level reward model to identify key tokens in reasoning trajectories.

  • Advanced reward reshaping mechanism for negative samples
  • Specialized system prompt for mathematical reasoning
  • Integration with existing chat templates for easy deployment
  • Support for multiple mathematical benchmarks

Core Capabilities

  • 94.0% accuracy on MATH-500 benchmark
  • 50.0% accuracy on AIME tests
  • 65.6% performance on LiveMathBench
  • 66.1% accuracy on OlympiadBench
  • Systematic approach to mathematical problem-solving

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its OREAL framework, which enables it to achieve 32B-model-level performance with only 7B parameters. It also features a sophisticated system prompt that guides systematic mathematical thinking and rigorous reasoning.

Q: What are the recommended use cases?

The model excels in mathematical competition problems, complex mathematical reasoning tasks, and educational applications requiring detailed step-by-step problem solving. It's particularly effective for tasks requiring deep mathematical understanding and systematic approach to problem-solving.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.