steiner-32b-preview

steiner-32b-preview

peakji

Steiner-32b is an experimental reasoning model focused on autonomous exploration of multiple reasoning paths, trained via RL on synthetic data

PropertyValue
AuthorYichao 'Peak' Ji
Base ModelQwen2.5-32B
Model TypeReasoning-focused LLM
PaperResearch Blog Post

What is steiner-32b-preview?

Steiner-32b-preview is an experimental language model designed to explore multiple reasoning paths autonomously. Inspired by OpenAI's o1, it represents a significant attempt to develop AI systems capable of self-guided reasoning and verification. The model is trained using reinforcement learning on synthetic data, enabling it to traverse reasoning paths in an autoregressive manner and perform self-verification or backtracking when necessary.

Implementation Details

The model is built on Qwen2.5-32B architecture and is specifically optimized for zero-shot reasoning tasks without requiring Chain of Thought (CoT) prompting. It features specialized logits processing for reasoning tokens and is compatible with standard inference services, particularly vLLM.

  • Trained on 90% English, 10% Chinese data composition
  • Requires specific inference parameters: skip_special_tokens=false, spaces_between_special_tokens=false
  • Achieves notable performance on scientific reasoning tasks (53.54% average on GPQA Diamond)

Core Capabilities

  • Autonomous exploration of multiple reasoning paths
  • Self-verification and backtracking capabilities
  • Zero-shot reasoning without CoT prompting
  • Strong performance in specific scientific domains (e.g., 76% in Quantum Mechanics, 80% in Molecular Biology)

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to autonomously explore multiple reasoning paths and perform self-verification without requiring explicit Chain of Thought prompting sets it apart from traditional language models. It represents an attempt to replicate OpenAI o1's capabilities in open-source form.

Q: What are the recommended use cases?

The model excels in single-turn reasoning tasks, particularly in scientific domains. However, it's not recommended for multi-turn dialogues or scenarios requiring extensive conversation. It's best suited for tasks requiring deep reasoning and self-verification capabilities.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026