UltraRM-13b

openbmb

UltraRM-13b is a SOTA reward model built on LLaMA2-13B, achieving 92.30% win rate vs text-davinci-003 on AlpacaEval benchmark.

Property	Value
Base Model	LLaMA2-13B
License	MIT
Paper	UltraFeedback Paper
Framework	PyTorch, Transformers

What is UltraRM-13b?

UltraRM-13b is a state-of-the-art reward model developed by OpenBMB, built on the LLaMA2-13B architecture. It's trained on the UltraFeedback dataset along with a mixture of other high-quality feedback datasets, including Anthropic HH-RLHF, Stanford SHP, and Summarization feedback data. The model has demonstrated exceptional performance, achieving a 92.30% win rate against text-davinci-003 on the AlpacaEval benchmark.

Implementation Details

The model implements a regression head on top of the LLaMA architecture to provide reward scores for text completions. It's designed to evaluate the quality of AI-generated responses and can be easily integrated into reinforcement learning pipelines.

Built on LLaMA2-13B architecture
Trained on UltraFeedback and multiple high-quality feedback datasets
Implements custom reward modeling architecture
Provides scalar reward scores for text evaluation

Core Capabilities

State-of-the-art performance in preference evaluation
Effective text quality assessment
Compatible with standard transformers pipeline
Supports both direct reward computation and comparative evaluation

Frequently Asked Questions

Q: What makes this model unique?

UltraRM-13b stands out for its exceptional performance in reward modeling, achieved through training on a diverse set of high-quality feedback datasets. It sets new state-of-the-art benchmarks for open-source reward models and demonstrates superior capabilities in evaluating text quality.

Q: What are the recommended use cases?

The model is primarily designed for evaluating the quality of language model outputs, making it ideal for: reinforcement learning from human feedback (RLHF), quality assessment of generated text, and model comparison studies. It's particularly useful in research and development of better language models.