UltraRM-13b

Maintained By
openbmb

UltraRM-13b

PropertyValue
Base ModelLLaMA2-13B
LicenseMIT
PaperUltraFeedback Paper
FrameworkPyTorch, Transformers

What is UltraRM-13b?

UltraRM-13b is a state-of-the-art reward model developed by OpenBMB, built on the LLaMA2-13B architecture. It's trained on the UltraFeedback dataset along with a mixture of other high-quality feedback datasets, including Anthropic HH-RLHF, Stanford SHP, and Summarization feedback data. The model has demonstrated exceptional performance, achieving a 92.30% win rate against text-davinci-003 on the AlpacaEval benchmark.

Implementation Details

The model implements a regression head on top of the LLaMA architecture to provide reward scores for text completions. It's designed to evaluate the quality of AI-generated responses and can be easily integrated into reinforcement learning pipelines.

  • Built on LLaMA2-13B architecture
  • Trained on UltraFeedback and multiple high-quality feedback datasets
  • Implements custom reward modeling architecture
  • Provides scalar reward scores for text evaluation

Core Capabilities

  • State-of-the-art performance in preference evaluation
  • Effective text quality assessment
  • Compatible with standard transformers pipeline
  • Supports both direct reward computation and comparative evaluation

Frequently Asked Questions

Q: What makes this model unique?

UltraRM-13b stands out for its exceptional performance in reward modeling, achieved through training on a diverse set of high-quality feedback datasets. It sets new state-of-the-art benchmarks for open-source reward models and demonstrates superior capabilities in evaluating text quality.

Q: What are the recommended use cases?

The model is primarily designed for evaluating the quality of language model outputs, making it ideal for: reinforcement learning from human feedback (RLHF), quality assessment of generated text, and model comparison studies. It's particularly useful in research and development of better language models.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.