s1-32B
Property | Value |
---|---|
Base Model | Qwen2.5-32B-Instruct |
Training Data | 1,000 examples |
Paper | arXiv:2501.19393 |
Repository | HuggingFace |
What is s1-32B?
s1-32B is a specialized reasoning model developed by SimpleScaling, created through fine-tuning Qwen2.5-32B-Instruct on a remarkably small dataset of just 1,000 examples. The model demonstrates impressive performance on mathematical reasoning tasks and implements an innovative test-time scaling approach through budget forcing.
Implementation Details
The model utilizes a unique budget forcing technique during inference, which involves ignoring end-of-thinking tokens and appending "Wait" up to four times to enhance reasoning capabilities. This approach has shown particularly strong results on mathematical reasoning benchmarks.
- Built on Qwen2.5-32B-Instruct architecture
- Trained on carefully curated 1,000 examples
- Implements budget forcing for improved reasoning
- Matches performance of o1-preview on several benchmarks
Core Capabilities
- Strong performance on MATH500 (93.0%)
- Competitive results on AIME2024 (56.7%)
- Solid showing on GPQA-Diamond (59.6%)
- Efficient fine-tuning with minimal data
Frequently Asked Questions
Q: What makes this model unique?
The model achieves remarkable performance despite being trained on just 1,000 examples, demonstrating efficient learning and effective test-time scaling through budget forcing techniques.
Q: What are the recommended use cases?
The model excels in mathematical reasoning tasks and is particularly suited for applications requiring complex problem-solving capabilities. However, users are recommended to consider using its successor s1.1 for better overall performance.