Dolphin 2.9.4 LLaMA 3.1 8B
Property | Value |
---|---|
Parameter Count | 8.03B |
Base Model | Meta-LLaMA-3.1-8B |
License | LLaMA 3.1 |
Context Length | 128K |
Training Sequence Length | 8192 |
What is dolphin-2.9.4-llama3.1-8b?
Dolphin 2.9.4 is a sophisticated fine-tuned version of Meta's LLaMA 3.1 8B model, developed by Eric Hartford and Cognitive Computations. This model represents a significant advancement in instruction-following and conversational AI, trained on a diverse set of 9 carefully curated datasets including specialized mathematical, coding, and agent-based training data.
Implementation Details
The model utilizes the ChatML prompt template format and has been fine-tuned with specific optimizations for instruction following and coding tasks. It features a substantial 128K context window, though the fine-tuning process used an 8192 sequence length. The training implementation included advanced techniques such as gradient checkpointing and flash attention for optimal performance.
- Trained using Axolotl version 0.4.1 with BF16 precision
- Implements sophisticated attention mechanisms including flash attention
- Uses cosine learning rate scheduler with warmup steps
- Employs gradient accumulation for stable training
Core Capabilities
- Strong instruction-following abilities in multiple languages
- Advanced coding and mathematical problem-solving
- Agentic capabilities and function calling support
- Uncensored responses with filtered dataset alignment
- Comprehensive evaluation performance across various benchmarks
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its combination of uncensored capabilities, strong instruction-following abilities, and specialized training across multiple domains. It's particularly notable for its extensive context window and optimized performance on both conversational and technical tasks.
Q: What are the recommended use cases?
The model excels in coding tasks, mathematical problem-solving, instruction following, and general conversational interactions. However, users should implement their own alignment layer before deploying it as a service, particularly due to its uncensored nature.