Nous-Hermes-2-Mixtral-8x7B-SFT

Property	Value
Parameter Count	46.7B
Model Type	Large Language Model
Architecture	Mixtral 8x7B MoE
License	Apache 2.0
Training Data	1M+ GPT-4 Generated Entries

What is Nous-Hermes-2-Mixtral-8x7B-SFT?

Nous-Hermes-2-Mixtral-8x7B-SFT is a sophisticated language model built on the Mixtral 8x7B MoE architecture, representing NousResearch's latest advancement in AI technology. This model is the supervised fine-tuning (SFT) version, trained on an extensive dataset of over 1 million entries, primarily generated by GPT-4, along with other high-quality AI datasets.

Implementation Details

The model utilizes the ChatML format for interactions, enabling structured multi-turn dialogues with system-level prompting capabilities. It's implemented with BF16 precision and supports various quantization options for different deployment scenarios.

Built on Mixtral 8x7B MoE architecture
Supports ChatML prompt format for enhanced interaction
Multiple quantization options available (GGUF, GPTQ, AWQ)
Compatible with OpenAI-style endpoints

Core Capabilities

Strong performance on GPT4All benchmarks (75.36% average)
Competitive results on AGIEval (44.89% average)
Robust reasoning capabilities demonstrated in BigBench tests (48.69% average)
Excellent at complex tasks including code generation and creative writing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its combination of the powerful Mixtral architecture with extensive GPT-4 generated training data, resulting in state-of-the-art performance across various benchmarks. It offers a pure SFT implementation, providing users with a choice between this version and the DPO variant.

Q: What are the recommended use cases?

The model excels in diverse applications including code generation, creative writing, data visualization, and complex reasoning tasks. It's particularly well-suited for applications requiring structured dialogue interactions through its ChatML support.