MiniMax-Text-01

Maintained By
MiniMaxAI

MiniMax-Text-01

PropertyValue
Total Parameters456B
Active Parameters per Token45.9B
Architecture TypeHybrid (Lightning + Softmax Attention with MoE)
Context LengthUp to 4M tokens (inference)
Model AccessHugging Face
PaperarXiv:2501.08313

What is MiniMax-Text-01?

MiniMax-Text-01 represents a significant advancement in large language model architecture, combining Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) in a hybrid design. With 456B total parameters and 45.9B activated parameters per token, it achieves impressive performance while maintaining computational efficiency through innovative parallel processing strategies.

Implementation Details

The model features an 80-layer architecture with a sophisticated attention mechanism where softmax attention is positioned after every 7 lightning attention layers. It utilizes 64 attention heads with 128 dimensions per head, complemented by a Mixture-of-Experts system with 32 experts and a top-2 routing strategy.

  • Hidden size: 6144
  • Vocab size: 200,064
  • Expert hidden dimension: 9216
  • Rotary Position Embedding (RoPE) with 10,000,000 base frequency

Core Capabilities

  • Extended context handling up to 4M tokens during inference
  • Strong performance on academic benchmarks including MMLLU, GSM8k, and HumanEval
  • Advanced long-context capabilities demonstrated through 4M Needle In A Haystack Test
  • Competitive performance in multilingual translation tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's hybrid architecture combining Lightning Attention with Softmax Attention and MoE, along with its ability to handle extremely long contexts up to 4M tokens, sets it apart from other large language models. Its innovative parallel processing strategies like LASP+ and ETP enable efficient processing of long sequences.

Q: What are the recommended use cases?

MiniMax-Text-01 excels in tasks requiring long context understanding, complex reasoning, and mathematical problem-solving. It's particularly well-suited for applications needing extensive context processing, such as document analysis, complex coding tasks, and detailed question-answering scenarios.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.