Nous-Hermes-2-Mixtral-8x7B-SFT
Property | Value |
---|---|
Parameter Count | 46.7B |
Model Type | Large Language Model |
Architecture | Mixtral 8x7B MoE |
License | Apache 2.0 |
Training Data | 1M+ GPT-4 Generated Entries |
What is Nous-Hermes-2-Mixtral-8x7B-SFT?
Nous-Hermes-2-Mixtral-8x7B-SFT is a sophisticated language model built on the Mixtral 8x7B MoE architecture, representing NousResearch's latest advancement in AI technology. This model is the supervised fine-tuning (SFT) version, trained on an extensive dataset of over 1 million entries, primarily generated by GPT-4, along with other high-quality AI datasets.
Implementation Details
The model utilizes the ChatML format for interactions, enabling structured multi-turn dialogues with system-level prompting capabilities. It's implemented with BF16 precision and supports various quantization options for different deployment scenarios.
- Built on Mixtral 8x7B MoE architecture
- Supports ChatML prompt format for enhanced interaction
- Multiple quantization options available (GGUF, GPTQ, AWQ)
- Compatible with OpenAI-style endpoints
Core Capabilities
- Strong performance on GPT4All benchmarks (75.36% average)
- Competitive results on AGIEval (44.89% average)
- Robust reasoning capabilities demonstrated in BigBench tests (48.69% average)
- Excellent at complex tasks including code generation and creative writing
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its combination of the powerful Mixtral architecture with extensive GPT-4 generated training data, resulting in state-of-the-art performance across various benchmarks. It offers a pure SFT implementation, providing users with a choice between this version and the DPO variant.
Q: What are the recommended use cases?
The model excels in diverse applications including code generation, creative writing, data visualization, and complex reasoning tasks. It's particularly well-suited for applications requiring structured dialogue interactions through its ChatML support.