MiniMax-Text-01

Property	Value
Total Parameters	456B
Active Parameters per Token	45.9B
Architecture Type	Hybrid (Lightning + Softmax Attention with MoE)
Context Length	Up to 4M tokens (inference)
Model Access	Hugging Face
Paper	arXiv:2501.08313

What is MiniMax-Text-01?

MiniMax-Text-01 represents a significant advancement in large language model architecture, combining Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) in a hybrid design. With 456B total parameters and 45.9B activated parameters per token, it achieves impressive performance while maintaining computational efficiency through innovative parallel processing strategies.

Implementation Details

The model features an 80-layer architecture with a sophisticated attention mechanism where softmax attention is positioned after every 7 lightning attention layers. It utilizes 64 attention heads with 128 dimensions per head, complemented by a Mixture-of-Experts system with 32 experts and a top-2 routing strategy.

Hidden size: 6144
Vocab size: 200,064
Expert hidden dimension: 9216
Rotary Position Embedding (RoPE) with 10,000,000 base frequency

Core Capabilities

Extended context handling up to 4M tokens during inference
Strong performance on academic benchmarks including MMLLU, GSM8k, and HumanEval
Advanced long-context capabilities demonstrated through 4M Needle In A Haystack Test
Competitive performance in multilingual translation tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's hybrid architecture combining Lightning Attention with Softmax Attention and MoE, along with its ability to handle extremely long contexts up to 4M tokens, sets it apart from other large language models. Its innovative parallel processing strategies like LASP+ and ETP enable efficient processing of long sequences.

Q: What are the recommended use cases?

MiniMax-Text-01 excels in tasks requiring long context understanding, complex reasoning, and mathematical problem-solving. It's particularly well-suited for applications needing extensive context processing, such as document analysis, complex coding tasks, and detailed question-answering scenarios.

MiniMax-Text-01

MiniMax-Text-01

What is MiniMax-Text-01?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models