Skywork-Reward-Gemma-2-27B-v0.2
Property | Value |
---|---|
Parameter Count | 27.2B |
Model Type | Text Classification |
Architecture | Gemma-2 Base |
Paper | Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs |
License | Skywork Community License |
What is Skywork-Reward-Gemma-2-27B-v0.2?
Skywork-Reward-Gemma-2-27B-v0.2 is a state-of-the-art reward model built on Google's Gemma-2-27b-it architecture. It's designed to evaluate and score text responses, trained on a carefully curated dataset of 80K high-quality preference pairs. The model currently ranks first on the RewardBench leaderboard with a remarkable score of 94.3.
Implementation Details
The model utilizes BF16 precision and requires either flash_attention_2 or eager implementation for optimal performance. It's trained on the Skywork Reward Data Collection, which includes data from multiple high-quality sources like HelpSteer2, OffsetBias, WildGuard, and the Magpie DPO series.
- Specialized scoring mechanism for preference evaluation
- Optimized for complex scenarios including mathematics, coding, and safety
- Implements advanced data curation techniques for balanced domain coverage
Core Capabilities
- Superior performance in chat evaluation (96.1% accuracy)
- Excellent reasoning capabilities (98.1% accuracy)
- Strong safety evaluation metrics (93.0% accuracy)
- Robust handling of challenging conversational scenarios (89.9% accuracy in Chat Hard)
Frequently Asked Questions
Q: What makes this model unique?
The model achieves state-of-the-art performance using only 80K carefully curated training pairs, demonstrating that high-quality data curation can outperform larger but less refined datasets. It's particularly notable for its balanced performance across different evaluation domains.
Q: What are the recommended use cases?
The model is ideal for evaluating AI-generated responses, particularly in scenarios requiring complex reasoning, safety assessment, and quality judgment of conversational outputs. It's especially useful for researchers and developers working on AI alignment and quality assessment.