Skywork-OR1-32B-Preview

Maintained By
Skywork

Skywork-OR1-32B-Preview

PropertyValue
Parameter Count32B
Model TypeLarge Language Model (Reasoning)
Base ModelDeepSeek-R1-Distill-Qwen-32B
GitHubHugging Face Repository

What is Skywork-OR1-32B-Preview?

Skywork-OR1-32B-Preview is a state-of-the-art reasoning model that is part of the Skywork Open Reasoner series. It's specifically designed to excel at mathematical and coding tasks, achieving performance comparable to much larger models like the 671B-parameter DeepSeek-R1. The model demonstrates exceptional capabilities in mathematical reasoning with an AIME24 score of 79.7 and AIME25 score of 69.0, while also performing strongly on coding tasks with a LiveCodeBench score of 63.9.

Implementation Details

The model is trained using a sophisticated multi-stage pipeline that incorporates a customized version of GRPO (Generalized Reinforcement Learning with Policy Optimization). The training process utilizes both offline and online difficulty-based filtering and rejection sampling, combined with adaptive entropy control for enhanced exploration and stability. The training data consists of 110K carefully curated math problems and 14K coding questions, all subjected to rigorous quality assessment.

  • Custom GRPO implementation with advanced filtering mechanisms
  • Multi-stage training pipeline with adaptive entropy control
  • Trained on carefully curated and verified datasets
  • Built on top of DeepSeek-R1-Distill-Qwen-32B architecture

Core Capabilities

  • Strong mathematical reasoning (79.7 on AIME24, 69.0 on AIME25)
  • Advanced coding capabilities (63.9 on LiveCodeBench)
  • Consistent performance across multiple attempts (measured using Avg@K metric)
  • Competitive performance with models 20x larger in parameter count

Frequently Asked Questions

Q: What makes this model unique?

The model achieves unprecedented performance for its size, matching the capabilities of models with significantly more parameters. It uses a novel evaluation approach (Avg@K) that better reflects real-world performance and stability.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical problem-solving, algorithmic reasoning, and coding tasks. It's designed for applications requiring robust reasoning capabilities in these domains.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.