Skywork-OR1-32B-Preview

Property	Value
Parameter Count	32B
Model Type	Large Language Model (Reasoning)
Base Model	DeepSeek-R1-Distill-Qwen-32B
GitHub	Hugging Face Repository

What is Skywork-OR1-32B-Preview?

Skywork-OR1-32B-Preview is a state-of-the-art reasoning model that is part of the Skywork Open Reasoner series. It's specifically designed to excel at mathematical and coding tasks, achieving performance comparable to much larger models like the 671B-parameter DeepSeek-R1. The model demonstrates exceptional capabilities in mathematical reasoning with an AIME24 score of 79.7 and AIME25 score of 69.0, while also performing strongly on coding tasks with a LiveCodeBench score of 63.9.

Implementation Details

The model is trained using a sophisticated multi-stage pipeline that incorporates a customized version of GRPO (Generalized Reinforcement Learning with Policy Optimization). The training process utilizes both offline and online difficulty-based filtering and rejection sampling, combined with adaptive entropy control for enhanced exploration and stability. The training data consists of 110K carefully curated math problems and 14K coding questions, all subjected to rigorous quality assessment.

Custom GRPO implementation with advanced filtering mechanisms
Multi-stage training pipeline with adaptive entropy control
Trained on carefully curated and verified datasets
Built on top of DeepSeek-R1-Distill-Qwen-32B architecture

Core Capabilities

Strong mathematical reasoning (79.7 on AIME24, 69.0 on AIME25)
Advanced coding capabilities (63.9 on LiveCodeBench)
Consistent performance across multiple attempts (measured using Avg@K metric)
Competitive performance with models 20x larger in parameter count

Frequently Asked Questions

Q: What makes this model unique?

The model achieves unprecedented performance for its size, matching the capabilities of models with significantly more parameters. It uses a novel evaluation approach (Avg@K) that better reflects real-world performance and stability.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical problem-solving, algorithmic reasoning, and coding tasks. It's designed for applications requiring robust reasoning capabilities in these domains.