Dolphin 2.5 Mixtral 8x7b
Property | Value |
---|---|
Parameter Count | 46.7B |
Model Type | Text Generation |
Architecture | Mixtral-based |
License | Apache 2.0 |
Context Window | 16k tokens |
Tensor Type | BF16 |
What is dolphin-2.5-mixtral-8x7b?
Dolphin 2.5 Mixtral 8x7b is an advanced language model built on the Mixtral architecture, specifically designed for enhanced coding capabilities and general text generation. Trained on 8 carefully curated datasets, this model represents a significant evolution in the Dolphin series, incorporating new datasets like Synthia, OpenHermes, and PureDove while removing previous datasets like Samantha and WizardLM.
Implementation Details
The model was trained over 1.5 epochs using qLoRA and Axolotl on 4 A100 GPUs, taking approximately 3 days to complete. It utilizes the ChatML prompt format and requires trust_remote_code for operation. The training infrastructure was sponsored by Convai, demonstrating a collaborative effort in the AI community.
- Built on Mixtral-8x7b architecture with 46.7B parameters
- 16k context window implementation
- Trained using qLoRA and Axolotl framework
- Implements ChatML prompt format
- Incorporates 8 specialized datasets
Core Capabilities
- Advanced coding assistance and generation
- Uncensored and unbiased responses
- High compliance with user requests
- Structured output generation
- Enhanced conversational abilities
- Comprehensive programming language support
Frequently Asked Questions
Q: What makes this model unique?
The model's primary distinguishing feature is its enhanced coding capabilities, combined with its uncensored nature and high compliance. It's built on the powerful Mixtral architecture and trained with a diverse set of high-quality datasets, making it particularly effective for both coding and general-purpose tasks.
Q: What are the recommended use cases?
The model excels in coding tasks, technical writing, and general conversational interactions. It's particularly suitable for developers, technical writers, and users requiring detailed technical assistance. However, due to its uncensored nature, implementing an alignment layer is recommended before deployment in production environments.