Skywork-Reward-Gemma-2-27B-v0.2

Skywork

A state-of-the-art 27B parameter reward model built on Gemma-2-27b-it, achieving top performance on RewardBench with advanced preference learning capabilities.

Property	Value
Parameter Count	27.2B
Model Type	Text Classification
Architecture	Gemma-2 Base
Paper	Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
License	Skywork Community License

What is Skywork-Reward-Gemma-2-27B-v0.2?

Skywork-Reward-Gemma-2-27B-v0.2 is a state-of-the-art reward model built on Google's Gemma-2-27b-it architecture. It's designed to evaluate and score text responses, trained on a carefully curated dataset of 80K high-quality preference pairs. The model currently ranks first on the RewardBench leaderboard with a remarkable score of 94.3.

Implementation Details

The model utilizes BF16 precision and requires either flash_attention_2 or eager implementation for optimal performance. It's trained on the Skywork Reward Data Collection, which includes data from multiple high-quality sources like HelpSteer2, OffsetBias, WildGuard, and the Magpie DPO series.

Specialized scoring mechanism for preference evaluation
Optimized for complex scenarios including mathematics, coding, and safety
Implements advanced data curation techniques for balanced domain coverage

Core Capabilities

Superior performance in chat evaluation (96.1% accuracy)
Excellent reasoning capabilities (98.1% accuracy)
Strong safety evaluation metrics (93.0% accuracy)
Robust handling of challenging conversational scenarios (89.9% accuracy in Chat Hard)

Frequently Asked Questions

Q: What makes this model unique?

The model achieves state-of-the-art performance using only 80K carefully curated training pairs, demonstrating that high-quality data curation can outperform larger but less refined datasets. It's particularly notable for its balanced performance across different evaluation domains.

Q: What are the recommended use cases?

The model is ideal for evaluating AI-generated responses, particularly in scenarios requiring complex reasoning, safety assessment, and quality judgment of conversational outputs. It's especially useful for researchers and developers working on AI alignment and quality assessment.