Open-RS3

Property	Value
Base Model	DeepSeek-R1-Distill-Qwen-1.5B
Parameter Count	1.5B
Paper	arXiv:2503.16219
Model URL	Hugging Face

What is Open-RS3?

Open-RS3 is an enhanced version of the DeepSeek-R1-Distill-Qwen-1.5B language model, specifically optimized for mathematical reasoning through reinforcement learning. This model represents a significant advancement in achieving strong reasoning capabilities with relatively small parameter counts, demonstrating that effective mathematical reasoning doesn't always require massive models.

Implementation Details

The model was trained using an efficient reinforcement learning approach on 4 A40 GPUs, completing training in under 24 hours at a cost of approximately $42. The training process utilized 7,000 samples, generating 42,000 total outputs, making it a highly cost-effective solution compared to traditional approaches.

Achieves 56.3% average score across benchmarks
80% accuracy on AMC23 mathematics tests
46.7% accuracy on AIME24, surpassing o1-preview's 44.6%
Competitive performance on MATH-500 benchmark

Core Capabilities

Advanced mathematical reasoning and problem-solving
Efficient performance on standardized mathematics tests
Cost-effective training approach for resource-constrained environments
Improved reasoning capabilities compared to baseline models

Frequently Asked Questions

Q: What makes this model unique?

Open-RS3 stands out for achieving impressive mathematical reasoning capabilities with a relatively small 1.5B parameter count, demonstrating that effective reasoning can be achieved through efficient RL training rather than just scaling model size.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical problem-solving applications, educational tools, and scenarios requiring advanced reasoning capabilities within resource-constrained environments.

Open-RS3

Open-RS3

What is Open-RS3?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models