Yuan2-M32-hf

Maintained By
IEITYuan

Yuan2-M32-hf

PropertyValue
Total Parameters40B
Active Parameters3.7B
Sequence Length16K
Training Tokens2000B
LicenseApache 2.0
PaperView Paper

What is Yuan2-M32-hf?

Yuan2-M32-hf is a groundbreaking Mixture-of-Experts (MoE) language model that represents a significant advancement in efficient AI architecture. With 32 experts but only 2 active at any time, it achieves remarkable performance while using just 9.25% of the computation required by comparable dense models. The model implements a novel Attention Router mechanism that improves accuracy by 3.8% compared to traditional routing approaches.

Implementation Details

The model features a sophisticated architecture that enables it to process sequences up to 16K tokens in length. Its forward computation requires only 7.4 GFLOPS per token, which is approximately 1/19th of what Llama3-70B needs.

  • Advanced Attention Router network for expert selection
  • 32 total experts with 2 active experts per forward pass
  • Only 3.7B active parameters out of 40B total
  • Trained on 2000B tokens from scratch

Core Capabilities

  • Outperforms Llama3-70B on MATH (55.9%) and ARC-Challenge (95.8%)
  • Strong performance in coding (74.4% on HumanEval)
  • Exceptional mathematical reasoning (92.7% on GSM8K)
  • Robust general knowledge (72.2% on MMLU)

Frequently Asked Questions

Q: What makes this model unique?

The model's key innovation lies in its Attention Router and efficient MoE architecture, achieving state-of-the-art performance with significantly fewer active parameters and computational requirements than comparable models.

Q: What are the recommended use cases?

The model excels in coding tasks, mathematical reasoning, and complex problem-solving scenarios. It's particularly well-suited for applications requiring high performance with limited computational resources.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.