Prometheus-13b-v1.0

Property	Value
Base Model	Llama-2-13b-chat
License	Apache 2.0
Paper	arxiv:2310.08491
Primary Use	Text Evaluation & Feedback

What is prometheus-13b-v1.0?

Prometheus-13b-v1.0 is a specialized language model designed to serve as a cost-effective alternative to GPT-4 for evaluating other language models and providing detailed feedback. Built on Llama-2-Chat architecture, it has been fine-tuned on an extensive dataset of 100K feedback samples, making it particularly adept at evaluating long-form responses and providing nuanced assessments.

Implementation Details

The model employs a unique evaluation framework that requires four key components: an instruction, a response to evaluate, a score rubric, and a reference answer. It processes these inputs using the Llama-2-Chat conversation template and generates detailed feedback along with numerical scores.

Fine-tuned on the Feedback Collection dataset
Supports both CPU and GPU inference with various precision options (FP16, INT8)
Implements systematic scoring on a 1-5 scale
Capable of handling complex evaluation criteria

Core Capabilities

High-quality evaluation of language model outputs
Detailed feedback generation with specific scoring criteria
Comparable performance to GPT-4 on various benchmarks
Customizable evaluation criteria for different use cases
Suitable for RLHF (Reinforcement Learning from Human Feedback) as a reward model

Frequently Asked Questions

Q: What makes this model unique?

Prometheus-13b stands out for its specialized training on feedback evaluation, making it a powerful tool for assessing AI model outputs with performance comparable to GPT-4, but at a lower cost. Its ability to handle customized evaluation criteria sets it apart from general-purpose language models.

Q: What are the recommended use cases?

The model is ideal for evaluating language model outputs, providing detailed feedback for text generation tasks, serving as a reward model for RLHF, and conducting fine-grained evaluations with custom criteria such as child readability, cultural sensitivity, or creativity.