llama-3-8b-gpt-4o-ru1.0-gguf
Property | Value |
---|---|
Parameter Count | 8.03B |
License | LLaMA3 |
Base Model | meta-llama/Meta-Llama-3-8B-Instruct |
Downloads | 73,877 |
What is llama-3-8b-gpt-4o-ru1.0-gguf?
This is a specialized version of LLaMA-3 8B model, fine-tuned specifically for enhanced Russian language capabilities. The model was trained on a carefully curated dataset derived from tagengo-gpt4, with 80% of training examples focused on Russian language content. Its performance matches or exceeds GPT-3.5-turbo in Russian language tasks, achieving an impressive MT-Bench score of 8.12 for Russian and 8.01 for English.
Implementation Details
The model was trained using the Axolotl framework on 2 NVIDIA A100 GPUs for 1 epoch. It implements advanced features like gradient checkpointing and flash attention, using a cosine learning rate scheduler with a 1e-5 learning rate.
- Sample packing enabled for efficient training
- 8192 sequence length
- Trained using DeepSpeed ZeRO-2 optimization
- Uses 8-bit AdamW optimizer
Core Capabilities
- Superior Russian language understanding and generation
- Competitive performance in both Russian (8.12) and English (8.01) on MT-Bench
- Optimized for GGUF format for efficient deployment
- Compatible with llama.cpp for local execution
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its specialized Russian language capabilities while maintaining strong English performance, achieved through focused training on high-quality GPT-4o generated data. It matches the performance of larger models trained on 8x bigger datasets.
Q: What are the recommended use cases?
The model is particularly well-suited for Russian language tasks, multilingual applications, and scenarios requiring efficient local deployment through GGUF format. It can be easily used with llama.cpp or the gptchain framework for chat-based applications.