Notus-7b-v1
Property | Value |
---|---|
Parameter Count | 7.24B |
Model Type | GPT-like DPO fine-tuned |
Base Model | zephyr-7b-sft-full |
License | MIT |
Training Data | UltraFeedback (binarized preferences) |
What is notus-7b-v1?
Notus-7b-v1 is an advanced language model that builds upon the Zephyr architecture, fine-tuned using Direct Preference Optimization (DPO) with a carefully curated version of the UltraFeedback dataset. This model represents a significant advancement in the 7B parameter space, notably outperforming Zephyr-7B-beta and matching Claude 2 on various benchmarks.
Implementation Details
The model leverages a data-first approach, utilizing binarized preference ratings instead of traditional critique scores. It was trained on 8 A100 40GB GPUs and implements the same prompt template as Zephyr-7b-beta for consistency and compatibility.
- Achieves 91.42% win rate on AlpacaEval
- Scores 7.30 on MT-Bench
- Outperforms base model on multiple academic benchmarks
- Implements BF16 precision for optimal performance
Core Capabilities
- Strong performance in conversational AI tasks
- Enhanced reasoning capabilities (64.59% on ARC challenge)
- Improved truthfulness compared to base model
- Superior performance on general knowledge tasks (63.03% on MMLU)
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its data-centric approach to fine-tuning, using binarized preferences from UltraFeedback instead of traditional critique scores, resulting in significantly improved performance across multiple benchmarks.
Q: What are the recommended use cases?
The model excels in chat-like applications and assistant-style interactions, making it ideal for conversational AI, general knowledge tasks, and reasoning challenges. It's particularly well-suited for applications requiring both accuracy and natural conversation flow.