Llama3-German-8B

Maintained By
DiscoResearch

Llama3-German-8B

PropertyValue
Parameter Count8.03B
LicenseLlama3
Research PaperLink to Paper
Tensor TypeBF16

What is Llama3-German-8B?

Llama3-German-8B is an advanced language model specifically optimized for German language processing. Built upon Meta's Llama3-8B architecture, this model underwent extensive continued pretraining on 65 billion high-quality German tokens, resulting in significantly improved German language capabilities while maintaining strong English performance. The model represents a collaborative effort between DiscoResearch, Occiglot, and the German Research Center for Artificial Intelligence (DFKI).

Implementation Details

The model was trained on 128 GPUs at hessian.Ai 42 for approximately 60 hours, utilizing advanced training techniques including intelligent document packing strategies. It features a sequence length of 8192 tokens and employs a cosine learning rate schedule from 1.5e-5 to 1.5e-6.

  • Training conducted over 15,500 steps with 155 warmup steps
  • Batch size of 4,194,304 tokens
  • AdamW optimizer with 0.05 weight decay
  • Innovative document packing strategy based on the "Fewer Truncations" approach

Core Capabilities

  • Enhanced German language understanding and generation
  • Strong performance on German benchmarks, particularly Hellaswag
  • Maintained English language capabilities
  • Available in multiple configurations including long-context (32k) and instruction-tuned versions
  • Supports efficient text generation and processing tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized German language capabilities achieved through continued pretraining on a massive German dataset, while maintaining strong English performance without replay training. It also implements innovative document packing strategies that improve overall benchmark scores.

Q: What are the recommended use cases?

As a base model, it's recommended for fine-tuning to specific tasks. It's particularly well-suited for German language processing tasks, including text generation, understanding, and analysis. Different versions are available for specific needs, including long-context processing (32k tokens) and instruction-tuned variants.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.