SauerkrautLM-Mixtral-8x7B-Instruct

Property	Value
Parameter Count	46.7B
Model Type	Mixture of Experts (MoE)
Languages	English, German, French, Italian, Spanish
License	Apache 2.0
Tensor Type	BF16

What is SauerkrautLM-Mixtral-8x7B-Instruct?

SauerkrautLM-Mixtral-8x7B-Instruct is an advanced multilingual language model based on Mixtral-8x7B-Instruct-v0.1, specifically optimized for German language processing while maintaining strong capabilities in English, French, Italian, and Spanish. The model has been fine-tuned using Direct Preference Optimization (DPO) with a carefully curated German dataset.

Implementation Details

The model employs a Mixture of Experts (MoE) architecture with sophisticated data augmentation techniques to ensure natural German language generation. It uses a specific prompt template and has been tested for data contamination to ensure high-quality outputs.

Trained with German data augmentation and translated content
Aligned through DPO with SauerkrautLM-DPO dataset
Incorporates translated parts of ultrafeedback_binarized dataset
Includes mathematical reasoning capabilities from argilla/distilabel-math-preference-dpo

Core Capabilities

Multilingual instruction following and generation
Strong performance in German RAG evaluation (97.5% accuracy)
Natural German language processing with grammatical accuracy
Mathematical reasoning and problem-solving
Context-aware responses with a 4096 token window

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized German language optimization while maintaining high performance in multiple languages. It uses advanced data augmentation techniques rather than simple translation, ensuring more natural German language output.

Q: What are the recommended use cases?

The model is well-suited for multilingual applications, particularly those requiring strong German language capabilities. It excels in instruction-following tasks, RAG applications, and mathematical reasoning scenarios.