Starling-RM-34B

Property	Value
Base Model	Yi-34B-Chat
Model Type	Reward Model for RLHF
License	Apache-2.0
Authors	Nexusflow (Berkeley Team)
Blog	Starling Project

What is Starling-RM-34B?

Starling-RM-34B is an advanced reward model developed by Nexusflow, built upon Yi-34B-Chat. It's designed to evaluate and score language model outputs based on helpfulness and safety. The model employs a unique architecture where the final layer of Yi-34B-Chat is replaced with a linear layer that produces scalar values for prompt-response pairs.

Implementation Details

The model is trained using the berkeley-nest/Nectar dataset with K-wise maximum likelihood estimation. It outputs scalar reward scores for any given prompt and response pair, where higher scores indicate more helpful and less harmful responses. The implementation includes specialized handling of padding tokens and attention masks for accurate reward calculation.

Achieves 80.7% accuracy on human preference benchmarks
Shows 71.2% accuracy on truth preference metrics
Demonstrates 78.2% accuracy in safety preference evaluation
Overall average accuracy of 76.7%

Core Capabilities

Response Quality Assessment: Evaluates the helpfulness and safety of language model outputs
Preference Learning: Trained on GPT-4 preferences through the Nectar dataset
Scalable Architecture: Supports batch processing with customizable batch sizes
Comprehensive Evaluation: Handles sequences up to 2048 tokens

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to provide scalar rewards for response quality, trained specifically on high-quality preference data from GPT-4. It shows significant improvements over its 7B parameter predecessor across all evaluation metrics.

Q: What are the recommended use cases?

This model is ideal for RLHF pipelines, response quality evaluation, and safety assessment of language model outputs. It's particularly useful for researchers and developers working on improving language model behavior through reinforcement learning.

Starling-RM-34B

Starling-RM-34B

What is Starling-RM-34B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models