Prometheus-13b-v1.0
Property | Value |
---|---|
Base Model | Llama-2-13b-chat |
License | Apache 2.0 |
Paper | arxiv:2310.08491 |
Primary Use | Text Evaluation & Feedback |
What is prometheus-13b-v1.0?
Prometheus-13b-v1.0 is a specialized language model designed to serve as a cost-effective alternative to GPT-4 for evaluating other language models and providing detailed feedback. Built on Llama-2-Chat architecture, it has been fine-tuned on an extensive dataset of 100K feedback samples, making it particularly adept at evaluating long-form responses and providing nuanced assessments.
Implementation Details
The model employs a unique evaluation framework that requires four key components: an instruction, a response to evaluate, a score rubric, and a reference answer. It processes these inputs using the Llama-2-Chat conversation template and generates detailed feedback along with numerical scores.
- Fine-tuned on the Feedback Collection dataset
- Supports both CPU and GPU inference with various precision options (FP16, INT8)
- Implements systematic scoring on a 1-5 scale
- Capable of handling complex evaluation criteria
Core Capabilities
- High-quality evaluation of language model outputs
- Detailed feedback generation with specific scoring criteria
- Comparable performance to GPT-4 on various benchmarks
- Customizable evaluation criteria for different use cases
- Suitable for RLHF (Reinforcement Learning from Human Feedback) as a reward model
Frequently Asked Questions
Q: What makes this model unique?
Prometheus-13b stands out for its specialized training on feedback evaluation, making it a powerful tool for assessing AI model outputs with performance comparable to GPT-4, but at a lower cost. Its ability to handle customized evaluation criteria sets it apart from general-purpose language models.
Q: What are the recommended use cases?
The model is ideal for evaluating language model outputs, providing detailed feedback for text generation tasks, serving as a reward model for RLHF, and conducting fine-grained evaluations with custom criteria such as child readability, cultural sensitivity, or creativity.