Llama3-German-8B

Maintained By
DiscoResearch

Llama3-German-8B

PropertyValue
Parameter Count8.03B
Licensellama3
PaperResearch Paper
Tensor TypeBF16
LanguageGerman (Primary), English (Secondary)

What is Llama3-German-8B?

Llama3-German-8B is an advanced language model that specializes in German language processing while maintaining strong English capabilities. Built upon Meta's Llama3-8B architecture, it underwent extensive continuous pretraining on 65 billion high-quality German tokens, resulting in significantly improved German language understanding and generation abilities.

Implementation Details

The model was trained for approximately 60 hours on 128 GPUs at hessian.Ai 42, utilizing sophisticated training techniques including intelligent document packing strategies. It features a sequence length of 8192 tokens and employs a cosine learning rate schedule from 1.5e-5 to 1.5e-6.

  • Training utilized 4.19M token batch size with AdamW optimizer
  • Implements sophisticated document packing based on the "Fewer Truncations" methodology
  • Achieves superior performance on German benchmarks while maintaining English capabilities

Core Capabilities

  • Enhanced German language understanding and generation
  • Strong performance on German benchmarks, particularly Hellaswag
  • Available in multiple configurations including long-context (32k) and instruction-tuned versions
  • Efficient processing with optimized document packing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out through its specialized German language capabilities achieved through extensive pretraining on high-quality German data, while maintaining strong English language performance. It's particularly notable for its implementation of advanced document packing strategies and availability in multiple configurations including long-context versions.

Q: What are the recommended use cases?

The model is primarily designed as a base model for further fine-tuning. It's particularly well-suited for German language tasks, including content generation, understanding, and analysis. Users can choose from various versions including instruction-tuned and long-context variants depending on their specific needs.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.