Prometheus-7b-v2.0
Property | Value |
---|---|
Parameter Count | 7.24B |
Base Model | Mistral-7B-Instruct-v0.2 |
License | Apache 2.0 |
Paper | Link to Paper |
Training Data | 100K Feedback Collection + 200K Preference Collection |
What is prometheus-7b-v2.0?
Prometheus-7b-v2.0 is an advanced language model designed specifically for evaluating other AI models. Built on the Mistral-Instruct architecture, it serves as a cost-effective alternative to GPT-4 for fine-grained evaluation tasks. The model uniquely combines both absolute grading (direct assessment) and relative grading (pairwise ranking) capabilities through an innovative weight merging approach.
Implementation Details
The model is implemented using the Transformers library and operates with BF16 tensor type for optimal performance. It's trained on a comprehensive dataset comprising 100,000 feedback samples and 200,000 preference pairs, enabling robust evaluation capabilities.
- Weight-merged architecture supporting both absolute and relative grading
- Built on Mistral-7B-Instruct-v0.2 foundation
- Specialized prompt formats for different evaluation types
- Comprehensive scoring rubrics implementation
Core Capabilities
- Direct assessment with 1-5 scoring system
- Pairwise comparison between responses
- Detailed feedback generation
- Fine-grained evaluation based on specific criteria
- Support for both instruction-following and response quality assessment
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its dual-capability approach combining absolute and relative grading through weight merging, which has shown to improve performance in both tasks. It's specifically designed for AI model evaluation, making it a cost-effective alternative to GPT-4 for assessment tasks.
Q: What are the recommended use cases?
The model is ideal for evaluating language model outputs, conducting fine-grained assessments of AI responses, performing quality control in AI systems, and providing detailed feedback based on specific criteria. It's particularly useful in RLHF (Reinforcement Learning from Human Feedback) pipelines.