Skywork-Reward-Gemma-2-27B-v0.2

Skywork-Reward-Gemma-2-27B-v0.2

Skywork

A state-of-the-art 27B parameter reward model built on Gemma-2-27b-it, achieving top performance on RewardBench with advanced preference learning capabilities.

PropertyValue
Parameter Count27.2B
Model TypeText Classification
ArchitectureGemma-2 Base
PaperSkywork-Reward: Bag of Tricks for Reward Modeling in LLMs
LicenseSkywork Community License

What is Skywork-Reward-Gemma-2-27B-v0.2?

Skywork-Reward-Gemma-2-27B-v0.2 is a state-of-the-art reward model built on Google's Gemma-2-27b-it architecture. It's designed to evaluate and score text responses, trained on a carefully curated dataset of 80K high-quality preference pairs. The model currently ranks first on the RewardBench leaderboard with a remarkable score of 94.3.

Implementation Details

The model utilizes BF16 precision and requires either flash_attention_2 or eager implementation for optimal performance. It's trained on the Skywork Reward Data Collection, which includes data from multiple high-quality sources like HelpSteer2, OffsetBias, WildGuard, and the Magpie DPO series.

  • Specialized scoring mechanism for preference evaluation
  • Optimized for complex scenarios including mathematics, coding, and safety
  • Implements advanced data curation techniques for balanced domain coverage

Core Capabilities

  • Superior performance in chat evaluation (96.1% accuracy)
  • Excellent reasoning capabilities (98.1% accuracy)
  • Strong safety evaluation metrics (93.0% accuracy)
  • Robust handling of challenging conversational scenarios (89.9% accuracy in Chat Hard)

Frequently Asked Questions

Q: What makes this model unique?

The model achieves state-of-the-art performance using only 80K carefully curated training pairs, demonstrating that high-quality data curation can outperform larger but less refined datasets. It's particularly notable for its balanced performance across different evaluation domains.

Q: What are the recommended use cases?

The model is ideal for evaluating AI-generated responses, particularly in scenarios requiring complex reasoning, safety assessment, and quality judgment of conversational outputs. It's especially useful for researchers and developers working on AI alignment and quality assessment.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026