Open-RS3

Maintained By
knoveleng

Open-RS3

PropertyValue
Base ModelDeepSeek-R1-Distill-Qwen-1.5B
Parameter Count1.5B
PaperarXiv:2503.16219
Model URLHugging Face

What is Open-RS3?

Open-RS3 is an enhanced version of the DeepSeek-R1-Distill-Qwen-1.5B language model, specifically optimized for mathematical reasoning through reinforcement learning. This model represents a significant advancement in achieving strong reasoning capabilities with relatively small parameter counts, demonstrating that effective mathematical reasoning doesn't always require massive models.

Implementation Details

The model was trained using an efficient reinforcement learning approach on 4 A40 GPUs, completing training in under 24 hours at a cost of approximately $42. The training process utilized 7,000 samples, generating 42,000 total outputs, making it a highly cost-effective solution compared to traditional approaches.

  • Achieves 56.3% average score across benchmarks
  • 80% accuracy on AMC23 mathematics tests
  • 46.7% accuracy on AIME24, surpassing o1-preview's 44.6%
  • Competitive performance on MATH-500 benchmark

Core Capabilities

  • Advanced mathematical reasoning and problem-solving
  • Efficient performance on standardized mathematics tests
  • Cost-effective training approach for resource-constrained environments
  • Improved reasoning capabilities compared to baseline models

Frequently Asked Questions

Q: What makes this model unique?

Open-RS3 stands out for achieving impressive mathematical reasoning capabilities with a relatively small 1.5B parameter count, demonstrating that effective reasoning can be achieved through efficient RL training rather than just scaling model size.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical problem-solving applications, educational tools, and scenarios requiring advanced reasoning capabilities within resource-constrained environments.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.