Starling-RM-34B

Maintained By
Nexusflow

Starling-RM-34B

PropertyValue
Base ModelYi-34B-Chat
Model TypeReward Model for RLHF
LicenseApache-2.0
AuthorsNexusflow (Berkeley Team)
BlogStarling Project

What is Starling-RM-34B?

Starling-RM-34B is an advanced reward model developed by Nexusflow, built upon Yi-34B-Chat. It's designed to evaluate and score language model outputs based on helpfulness and safety. The model employs a unique architecture where the final layer of Yi-34B-Chat is replaced with a linear layer that produces scalar values for prompt-response pairs.

Implementation Details

The model is trained using the berkeley-nest/Nectar dataset with K-wise maximum likelihood estimation. It outputs scalar reward scores for any given prompt and response pair, where higher scores indicate more helpful and less harmful responses. The implementation includes specialized handling of padding tokens and attention masks for accurate reward calculation.

  • Achieves 80.7% accuracy on human preference benchmarks
  • Shows 71.2% accuracy on truth preference metrics
  • Demonstrates 78.2% accuracy in safety preference evaluation
  • Overall average accuracy of 76.7%

Core Capabilities

  • Response Quality Assessment: Evaluates the helpfulness and safety of language model outputs
  • Preference Learning: Trained on GPT-4 preferences through the Nectar dataset
  • Scalable Architecture: Supports batch processing with customizable batch sizes
  • Comprehensive Evaluation: Handles sequences up to 2048 tokens

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to provide scalar rewards for response quality, trained specifically on high-quality preference data from GPT-4. It shows significant improvements over its 7B parameter predecessor across all evaluation metrics.

Q: What are the recommended use cases?

This model is ideal for RLHF pipelines, response quality evaluation, and safety assessment of language model outputs. It's particularly useful for researchers and developers working on improving language model behavior through reinforcement learning.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.