llama-3-8b-gpt-4o-ru1.0-gguf

Property	Value
Parameter Count	8.03B
License	LLaMA3
Base Model	meta-llama/Meta-Llama-3-8B-Instruct
Downloads	73,877

What is llama-3-8b-gpt-4o-ru1.0-gguf?

This is a specialized version of LLaMA-3 8B model, fine-tuned specifically for enhanced Russian language capabilities. The model was trained on a carefully curated dataset derived from tagengo-gpt4, with 80% of training examples focused on Russian language content. Its performance matches or exceeds GPT-3.5-turbo in Russian language tasks, achieving an impressive MT-Bench score of 8.12 for Russian and 8.01 for English.

Implementation Details

The model was trained using the Axolotl framework on 2 NVIDIA A100 GPUs for 1 epoch. It implements advanced features like gradient checkpointing and flash attention, using a cosine learning rate scheduler with a 1e-5 learning rate.

Sample packing enabled for efficient training
8192 sequence length
Trained using DeepSpeed ZeRO-2 optimization
Uses 8-bit AdamW optimizer

Core Capabilities

Superior Russian language understanding and generation
Competitive performance in both Russian (8.12) and English (8.01) on MT-Bench
Optimized for GGUF format for efficient deployment
Compatible with llama.cpp for local execution

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its specialized Russian language capabilities while maintaining strong English performance, achieved through focused training on high-quality GPT-4o generated data. It matches the performance of larger models trained on 8x bigger datasets.

Q: What are the recommended use cases?

The model is particularly well-suited for Russian language tasks, multilingual applications, and scenarios requiring efficient local deployment through GGUF format. It can be easily used with llama.cpp or the gptchain framework for chat-based applications.