phixtral-2x2_8

Maintained By
mlabonne

phixtral-2x2_8

PropertyValue
Parameter Count4.46B
Model TypeMixture of Experts (MoE)
LicenseMIT
Tensor TypeFP16
Primary LanguageEnglish

What is phixtral-2x2_8?

phixtral-2x2_8 is an innovative Mixture of Experts (MoE) model that combines two Microsoft phi-2 models into a more powerful unified system. Created by mlabonne, this model represents a significant advancement in combining expert models for enhanced performance, inspired by the Mixtral-8x7B-v0.1 architecture.

Implementation Details

The model utilizes a custom version of the mergekit library and implements a sophisticated architecture with two expert models: dolphin-2_6-phi-2 and phi-2-dpo. It features configurable parameters including num_experts_per_tok and num_local_experts, both defaulting to 2.

  • Custom MoE implementation with gate_mode: cheap_embed
  • 4-bit precision support for efficient inference
  • Integrated transformer architecture with expert routing
  • Flexible configuration through config.json

Core Capabilities

  • Superior performance on benchmark tests (47.78% average across multiple evaluations)
  • Enhanced text generation and code completion
  • Improved truthfulness in responses (48.78% on TruthfulQA)
  • Efficient resource utilization through expert routing

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its successful implementation of the Mixture of Experts architecture with phi-2 models, achieving better performance than individual experts while maintaining efficiency.

Q: What are the recommended use cases?

The model excels in text generation, code completion, and general NLP tasks. It's particularly suitable for applications requiring balanced performance across multiple domains while maintaining computational efficiency.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.