prometheus-13b-v1.0

Maintained By
prometheus-eval

Prometheus-13b-v1.0

PropertyValue
Base ModelLlama-2-13b-chat
LicenseApache 2.0
Paperarxiv:2310.08491
Primary UseText Evaluation & Feedback

What is prometheus-13b-v1.0?

Prometheus-13b-v1.0 is a specialized language model designed to serve as a cost-effective alternative to GPT-4 for evaluating other language models and providing detailed feedback. Built on Llama-2-Chat architecture, it has been fine-tuned on an extensive dataset of 100K feedback samples, making it particularly adept at evaluating long-form responses and providing nuanced assessments.

Implementation Details

The model employs a unique evaluation framework that requires four key components: an instruction, a response to evaluate, a score rubric, and a reference answer. It processes these inputs using the Llama-2-Chat conversation template and generates detailed feedback along with numerical scores.

  • Fine-tuned on the Feedback Collection dataset
  • Supports both CPU and GPU inference with various precision options (FP16, INT8)
  • Implements systematic scoring on a 1-5 scale
  • Capable of handling complex evaluation criteria

Core Capabilities

  • High-quality evaluation of language model outputs
  • Detailed feedback generation with specific scoring criteria
  • Comparable performance to GPT-4 on various benchmarks
  • Customizable evaluation criteria for different use cases
  • Suitable for RLHF (Reinforcement Learning from Human Feedback) as a reward model

Frequently Asked Questions

Q: What makes this model unique?

Prometheus-13b stands out for its specialized training on feedback evaluation, making it a powerful tool for assessing AI model outputs with performance comparable to GPT-4, but at a lower cost. Its ability to handle customized evaluation criteria sets it apart from general-purpose language models.

Q: What are the recommended use cases?

The model is ideal for evaluating language model outputs, providing detailed feedback for text generation tasks, serving as a reward model for RLHF, and conducting fine-grained evaluations with custom criteria such as child readability, cultural sensitivity, or creativity.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.