notux-8x7b-v1

notux-8x7b-v1

argilla

A powerful 46.7B parameter MoE model fine-tuned from Mixtral-8x7B using DPO, supporting 5 languages and achieving top performance on the Open LLM Leaderboard.

PropertyValue
Parameter Count46.7B
Model TypeMixture of Experts (MoE)
LicenseApache 2.0
LanguagesEnglish, German, Spanish, French, Italian
Base ModelMixtral-8x7B-Instruct-v0.1

What is notux-8x7b-v1?

Notux-8x7b-v1 is an advanced language model developed by Argilla, built upon the Mixtral-8x7B-Instruct-v0.1 architecture. It represents a significant advancement in preference-tuned language models, achieving top performance among MoE models on the Hugging Face Open LLM Leaderboard. The model was fine-tuned using Direct Preference Optimization (DPO) on the UltraFeedback preferences dataset.

Implementation Details

The model was trained on 8 H100 80GB GPUs for one epoch (approximately 10 hours) using carefully optimized hyperparameters. It implements a sparse Mixture of Experts architecture with BF16 precision, featuring advanced training procedures including a learning rate of 5e-07 and a linear scheduler with 0.1 warmup ratio.

  • Trained with Adam optimizer (betas=0.9,0.999)
  • Total batch size of 64 for training
  • Demonstrates superior performance on multiple benchmarks
  • Implements efficient MoE architecture for better resource utilization

Core Capabilities

  • Achieves 73.18% average score on the Open LLM Leaderboard
  • Excels in HellaSwag (87.73%) and Winogrande (81.61%) benchmarks
  • Strong performance in reasoning tasks (AI2 Challenge: 70.99%)
  • Multilingual support across 5 major European languages
  • Enhanced preference alignment through DPO training

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its innovative use of preference tuning on an already DPO-trained model, creating a more refined and aligned language model. It's particularly notable for achieving top performance among MoE models while maintaining multilingual capabilities.

Q: What are the recommended use cases?

The model excels in tasks requiring strong reasoning capabilities, multilingual understanding, and aligned responses. It's particularly well-suited for applications requiring both high performance and ethical alignment, such as content generation, analysis, and complex reasoning tasks across multiple languages.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026