s1-32B

simplescaling

A reasoning model finetuned from Qwen2.5-32B-Instruct using just 1,000 examples, featuring test-time scaling and competitive performance on math tasks.

Property	Value
Base Model	Qwen2.5-32B-Instruct
Training Data	1,000 examples
Paper	arXiv:2501.19393
Repository	HuggingFace

What is s1-32B?

s1-32B is a specialized reasoning model developed by SimpleScaling, created through fine-tuning Qwen2.5-32B-Instruct on a remarkably small dataset of just 1,000 examples. The model demonstrates impressive performance on mathematical reasoning tasks and implements an innovative test-time scaling approach through budget forcing.

Implementation Details

The model utilizes a unique budget forcing technique during inference, which involves ignoring end-of-thinking tokens and appending "Wait" up to four times to enhance reasoning capabilities. This approach has shown particularly strong results on mathematical reasoning benchmarks.

Built on Qwen2.5-32B-Instruct architecture
Trained on carefully curated 1,000 examples
Implements budget forcing for improved reasoning
Matches performance of o1-preview on several benchmarks

Core Capabilities

Strong performance on MATH500 (93.0%)
Competitive results on AIME2024 (56.7%)
Solid showing on GPQA-Diamond (59.6%)
Efficient fine-tuning with minimal data

Frequently Asked Questions

Q: What makes this model unique?

The model achieves remarkable performance despite being trained on just 1,000 examples, demonstrating efficient learning and effective test-time scaling through budget forcing techniques.

Q: What are the recommended use cases?

The model excels in mathematical reasoning tasks and is particularly suited for applications requiring complex problem-solving capabilities. However, users are recommended to consider using its successor s1.1 for better overall performance.