s1-32B

s1-32B

simplescaling

A reasoning model finetuned from Qwen2.5-32B-Instruct using just 1,000 examples, featuring test-time scaling and competitive performance on math tasks.

PropertyValue
Base ModelQwen2.5-32B-Instruct
Training Data1,000 examples
PaperarXiv:2501.19393
RepositoryHuggingFace

What is s1-32B?

s1-32B is a specialized reasoning model developed by SimpleScaling, created through fine-tuning Qwen2.5-32B-Instruct on a remarkably small dataset of just 1,000 examples. The model demonstrates impressive performance on mathematical reasoning tasks and implements an innovative test-time scaling approach through budget forcing.

Implementation Details

The model utilizes a unique budget forcing technique during inference, which involves ignoring end-of-thinking tokens and appending "Wait" up to four times to enhance reasoning capabilities. This approach has shown particularly strong results on mathematical reasoning benchmarks.

  • Built on Qwen2.5-32B-Instruct architecture
  • Trained on carefully curated 1,000 examples
  • Implements budget forcing for improved reasoning
  • Matches performance of o1-preview on several benchmarks

Core Capabilities

  • Strong performance on MATH500 (93.0%)
  • Competitive results on AIME2024 (56.7%)
  • Solid showing on GPQA-Diamond (59.6%)
  • Efficient fine-tuning with minimal data

Frequently Asked Questions

Q: What makes this model unique?

The model achieves remarkable performance despite being trained on just 1,000 examples, demonstrating efficient learning and effective test-time scaling through budget forcing techniques.

Q: What are the recommended use cases?

The model excels in mathematical reasoning tasks and is particularly suited for applications requiring complex problem-solving capabilities. However, users are recommended to consider using its successor s1.1 for better overall performance.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026