Einstein-v6.1-Llama3-8B
Property | Value |
---|---|
Parameter Count | 8.03B |
Base Model | Meta-Llama-3-8B |
License | Other |
Training Hardware | 8xRTX3090 + 1xRTXA6000 |
What is Einstein-v6.1-Llama3-8B?
Einstein-v6.1-Llama3-8B is a specialized language model fine-tuned from Meta's Llama-3-8B architecture, specifically optimized for STEM-related tasks and scientific reasoning. The model demonstrates impressive capabilities across various benchmarks, including a 66.19% accuracy on MMLU and 66.11% on GSM8k mathematical reasoning tasks.
Implementation Details
The model was trained using the Axolotl framework with a combination of 38 carefully curated datasets. It employs the ChatML prompt template format and utilizes advanced training techniques including gradient checkpointing and flash attention for optimal performance.
- Training utilized BF16 precision with sample packing
- Implemented with cosine learning rate scheduler
- Trained for 2 epochs with 2026 total steps
- Uses flash attention and gradient checkpointing for efficiency
Core Capabilities
- Strong performance in scientific reasoning (62.46% on AI2 Reasoning Challenge)
- Exceptional results in general knowledge (82.41% on HellaSwag)
- Advanced mathematical problem-solving (66.11% on GSM8k)
- Robust truthfulness evaluation (55.1% on TruthfulQA)
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its specialized training on STEM-focused datasets, making it particularly effective for scientific and mathematical reasoning tasks while maintaining strong general-purpose capabilities.
Q: What are the recommended use cases?
This model is ideal for scientific research, educational applications, mathematical problem-solving, and general knowledge tasks requiring precise technical understanding.