prometheus-7b-v2.0

Maintained By
prometheus-eval

Prometheus-7b-v2.0

PropertyValue
Parameter Count7.24B
Base ModelMistral-7B-Instruct-v0.2
LicenseApache 2.0
PaperLink to Paper
Training Data100K Feedback Collection + 200K Preference Collection

What is prometheus-7b-v2.0?

Prometheus-7b-v2.0 is an advanced language model designed specifically for evaluating other AI models. Built on the Mistral-Instruct architecture, it serves as a cost-effective alternative to GPT-4 for fine-grained evaluation tasks. The model uniquely combines both absolute grading (direct assessment) and relative grading (pairwise ranking) capabilities through an innovative weight merging approach.

Implementation Details

The model is implemented using the Transformers library and operates with BF16 tensor type for optimal performance. It's trained on a comprehensive dataset comprising 100,000 feedback samples and 200,000 preference pairs, enabling robust evaluation capabilities.

  • Weight-merged architecture supporting both absolute and relative grading
  • Built on Mistral-7B-Instruct-v0.2 foundation
  • Specialized prompt formats for different evaluation types
  • Comprehensive scoring rubrics implementation

Core Capabilities

  • Direct assessment with 1-5 scoring system
  • Pairwise comparison between responses
  • Detailed feedback generation
  • Fine-grained evaluation based on specific criteria
  • Support for both instruction-following and response quality assessment

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its dual-capability approach combining absolute and relative grading through weight merging, which has shown to improve performance in both tasks. It's specifically designed for AI model evaluation, making it a cost-effective alternative to GPT-4 for assessment tasks.

Q: What are the recommended use cases?

The model is ideal for evaluating language model outputs, conducting fine-grained assessments of AI responses, performing quality control in AI systems, and providing detailed feedback based on specific criteria. It's particularly useful in RLHF (Reinforcement Learning from Human Feedback) pipelines.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.