mamba-2.8b-zephyr

Maintained By
xiuyul

Mamba-2.8B-Zephyr

PropertyValue
Base Modelstate-spaces/mamba-2.8b-slimpj
Training MethodDirect Preference Optimization (DPO)
DatasetUltraFeedback Binarized
Model Size2.8B parameters
Accuracy78.57%
Authorxiuyul

What is mamba-2.8b-zephyr?

Mamba-2.8b-zephyr is an advanced language model that builds upon the Mamba architecture, specifically fine-tuned using Direct Preference Optimization (DPO) on the UltraFeedback dataset. The model represents a significant advancement in preference-aligned language modeling, achieving impressive accuracy rates in distinguishing preferred responses.

Implementation Details

The model was developed through a two-stage process: first, the base model (mamba-2.8b-slimpj) was instruction-tuned on the UltraChat 200k dataset, followed by preference optimization using DPO on the UltraFeedback binarized dataset. Training utilized multi-GPU setup across 8 devices with carefully tuned hyperparameters including a learning rate of 5e-07 and linear scheduling with 0.1 warmup ratio.

  • Trained over 3 epochs with batch size of 64
  • Implemented Adam optimizer with betas=(0.9,0.999)
  • Achieved final validation loss of 0.4996
  • Demonstrated strong preference learning with reward margin of 1.1582

Core Capabilities

  • High accuracy (78.57%) in preference alignment
  • Effective distinction between chosen and rejected responses
  • Robust performance across varied input contexts
  • Optimized for instruction following and preference learning

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its use of the Mamba architecture combined with DPO training, resulting in strong preference alignment capabilities while maintaining efficient processing characteristics of state-space models.

Q: What are the recommended use cases?

While specific use cases aren't detailed in the model card, the model's strong preference alignment makes it suitable for tasks requiring nuanced understanding of user preferences and high-quality instruction following.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.