Skywork-o1-Open-PRM-Qwen-2.5-1.5B

Property	Value
Base Model	Qwen2.5-Math-1.5B-Instruct
License	Skywork Community License
Primary Use	Text Classification / Reward Modeling
Framework	PyTorch

What is Skywork-o1-Open-PRM-Qwen-2.5-1.5B?

This is a specialized Process Reward Model (PRM) developed by Skywork, designed to enhance reasoning capabilities through incremental process rewards. It's part of the Skywork o1 Open model series, specifically optimized for mathematical and coding tasks evaluation.

Implementation Details

Built on the Qwen2.5-Math-1.5B-Instruct architecture, this model implements a novel approach to evaluating reasoning processes step-by-step rather than just final outputs. The model demonstrates strong performance across various mathematical and coding benchmarks, including GSM8K, MATH, GaoKao, and various coding tasks.

Implements Best-of-N@64 sampling strategy for optimal performance
Supports both mathematical and code evaluation tasks
Compatible with vLLM server deployment for efficient inference

Core Capabilities

Mathematical reasoning evaluation across multiple difficulty levels
Step-wise assessment of problem-solving processes
Strong performance on competition-level datasets
Code evaluation capabilities on benchmarks like MBPP and HumanEval

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to evaluate reasoning processes incrementally, providing rewards for each step rather than just the final answer, sets it apart from traditional reward models. This enables more nuanced assessment of problem-solving approaches.

Q: What are the recommended use cases?

The model excels at evaluating mathematical problem-solving and code generation tasks. It's particularly useful for educational applications, automated assessment systems, and AI model training where step-by-step reasoning evaluation is crucial.