Llama3-German-8B

Property	Value
Parameter Count	8.03B
License	Llama3
Research Paper	Link to Paper
Tensor Type	BF16

What is Llama3-German-8B?

Llama3-German-8B is an advanced language model specifically optimized for German language processing. Built upon Meta's Llama3-8B architecture, this model underwent extensive continued pretraining on 65 billion high-quality German tokens, resulting in significantly improved German language capabilities while maintaining strong English performance. The model represents a collaborative effort between DiscoResearch, Occiglot, and the German Research Center for Artificial Intelligence (DFKI).

Implementation Details

The model was trained on 128 GPUs at hessian.Ai 42 for approximately 60 hours, utilizing advanced training techniques including intelligent document packing strategies. It features a sequence length of 8192 tokens and employs a cosine learning rate schedule from 1.5e-5 to 1.5e-6.

Training conducted over 15,500 steps with 155 warmup steps
Batch size of 4,194,304 tokens
AdamW optimizer with 0.05 weight decay
Innovative document packing strategy based on the "Fewer Truncations" approach

Core Capabilities

Enhanced German language understanding and generation
Strong performance on German benchmarks, particularly Hellaswag
Maintained English language capabilities
Available in multiple configurations including long-context (32k) and instruction-tuned versions
Supports efficient text generation and processing tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized German language capabilities achieved through continued pretraining on a massive German dataset, while maintaining strong English performance without replay training. It also implements innovative document packing strategies that improve overall benchmark scores.

Q: What are the recommended use cases?

As a base model, it's recommended for fine-tuning to specific tasks. It's particularly well-suited for German language processing tasks, including text generation, understanding, and analysis. Different versions are available for specific needs, including long-context processing (32k tokens) and instruction-tuned variants.

Llama3-German-8B

Llama3-German-8B

What is Llama3-German-8B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models