Dolphin-2.9.3-Mistral-7B-32k
Property | Value |
---|---|
Base Model | Mistral-7B-v0.3 |
Context Length | 32,000 tokens |
License | Apache 2.0 |
Training Sequence Length | 8,192 tokens |
Model URL | cognitivecomputations/dolphin-2.9.3-mistral-7B-32k |
What is dolphin-2.9.3-mistral-7B-32k?
Dolphin-2.9.3 is an advanced language model built on Mistral-7B-v0.3, specifically designed to provide enhanced capabilities in instruction following, conversation, and coding. Developed by Eric Hartford and Cognitive Computations, this uncensored model features an extended context window of 32k tokens and implements the ChatML prompt template format.
Implementation Details
The model was trained using the Axolotl framework (v0.4.0) with a sequence length of 8,192 tokens. It incorporates various training datasets including coding feedback, math problems, agent instructions, and multilingual content. The training configuration utilized AdamW 8-bit optimizer with cosine learning rate scheduling and gradient checkpointing for efficient training.
- Implements flash attention and sample packing
- Uses BF16 precision training
- Trained with gradient accumulation steps of 16
- Incorporates warmup steps and save checkpoints
Core Capabilities
- Extended context processing (32k tokens)
- Advanced coding and programming assistance
- Function calling support
- Initial agentic capabilities
- Multilingual conversation handling
- Math problem-solving abilities (4.83 score on MATH Lvl 5)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its uncensored nature and extensive context window, combined with strong performance across various tasks including coding, math, and conversational abilities. It achieved notable scores in evaluations like IFEval (41.26) and BBH (26.91).
Q: What are the recommended use cases?
This model is particularly suited for coding tasks, complex problem-solving, extended conversations, and scenarios requiring longer context understanding. However, users should implement their own alignment layer before deploying it as a service, as the model is uncensored and highly compliant with all requests.