Notus-7b-v1

Property	Value
Parameter Count	7.24B
Model Type	GPT-like DPO fine-tuned
Base Model	zephyr-7b-sft-full
License	MIT
Training Data	UltraFeedback (binarized preferences)

What is notus-7b-v1?

Notus-7b-v1 is an advanced language model that builds upon the Zephyr architecture, fine-tuned using Direct Preference Optimization (DPO) with a carefully curated version of the UltraFeedback dataset. This model represents a significant advancement in the 7B parameter space, notably outperforming Zephyr-7B-beta and matching Claude 2 on various benchmarks.

Implementation Details

The model leverages a data-first approach, utilizing binarized preference ratings instead of traditional critique scores. It was trained on 8 A100 40GB GPUs and implements the same prompt template as Zephyr-7b-beta for consistency and compatibility.

Achieves 91.42% win rate on AlpacaEval
Scores 7.30 on MT-Bench
Outperforms base model on multiple academic benchmarks
Implements BF16 precision for optimal performance

Core Capabilities

Strong performance in conversational AI tasks
Enhanced reasoning capabilities (64.59% on ARC challenge)
Improved truthfulness compared to base model
Superior performance on general knowledge tasks (63.03% on MMLU)

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its data-centric approach to fine-tuning, using binarized preferences from UltraFeedback instead of traditional critique scores, resulting in significantly improved performance across multiple benchmarks.

Q: What are the recommended use cases?

The model excels in chat-like applications and assistant-style interactions, making it ideal for conversational AI, general knowledge tasks, and reasoning challenges. It's particularly well-suited for applications requiring both accuracy and natural conversation flow.

notus-7b-v1