Dolphin 2.9.4 LLaMA 3.1 8B

Property	Value
Parameter Count	8.03B
Base Model	Meta-LLaMA-3.1-8B
License	LLaMA 3.1
Context Length	128K
Training Sequence Length	8192

What is dolphin-2.9.4-llama3.1-8b?

Dolphin 2.9.4 is a sophisticated fine-tuned version of Meta's LLaMA 3.1 8B model, developed by Eric Hartford and Cognitive Computations. This model represents a significant advancement in instruction-following and conversational AI, trained on a diverse set of 9 carefully curated datasets including specialized mathematical, coding, and agent-based training data.

Implementation Details

The model utilizes the ChatML prompt template format and has been fine-tuned with specific optimizations for instruction following and coding tasks. It features a substantial 128K context window, though the fine-tuning process used an 8192 sequence length. The training implementation included advanced techniques such as gradient checkpointing and flash attention for optimal performance.

Trained using Axolotl version 0.4.1 with BF16 precision
Implements sophisticated attention mechanisms including flash attention
Uses cosine learning rate scheduler with warmup steps
Employs gradient accumulation for stable training

Core Capabilities

Strong instruction-following abilities in multiple languages
Advanced coding and mathematical problem-solving
Agentic capabilities and function calling support
Uncensored responses with filtered dataset alignment
Comprehensive evaluation performance across various benchmarks

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its combination of uncensored capabilities, strong instruction-following abilities, and specialized training across multiple domains. It's particularly notable for its extensive context window and optimized performance on both conversational and technical tasks.

Q: What are the recommended use cases?

The model excels in coding tasks, mathematical problem-solving, instruction following, and general conversational interactions. However, users should implement their own alignment layer before deploying it as a service, particularly due to its uncensored nature.