s1-32B

Maintained By
simplescaling

s1-32B

PropertyValue
Base ModelQwen2.5-32B-Instruct
Training Data1,000 examples
PaperarXiv:2501.19393
RepositoryHuggingFace

What is s1-32B?

s1-32B is a specialized reasoning model developed by SimpleScaling, created through fine-tuning Qwen2.5-32B-Instruct on a remarkably small dataset of just 1,000 examples. The model demonstrates impressive performance on mathematical reasoning tasks and implements an innovative test-time scaling approach through budget forcing.

Implementation Details

The model utilizes a unique budget forcing technique during inference, which involves ignoring end-of-thinking tokens and appending "Wait" up to four times to enhance reasoning capabilities. This approach has shown particularly strong results on mathematical reasoning benchmarks.

  • Built on Qwen2.5-32B-Instruct architecture
  • Trained on carefully curated 1,000 examples
  • Implements budget forcing for improved reasoning
  • Matches performance of o1-preview on several benchmarks

Core Capabilities

  • Strong performance on MATH500 (93.0%)
  • Competitive results on AIME2024 (56.7%)
  • Solid showing on GPQA-Diamond (59.6%)
  • Efficient fine-tuning with minimal data

Frequently Asked Questions

Q: What makes this model unique?

The model achieves remarkable performance despite being trained on just 1,000 examples, demonstrating efficient learning and effective test-time scaling through budget forcing techniques.

Q: What are the recommended use cases?

The model excels in mathematical reasoning tasks and is particularly suited for applications requiring complex problem-solving capabilities. However, users are recommended to consider using its successor s1.1 for better overall performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.