Tencent-Hunyuan-Large
Property | Value |
---|---|
Total Parameters | 389 billion |
Active Parameters | 52 billion |
License | Tencent License |
Paper | arXiv:2411.02265 |
Maximum Context Length | 256K tokens (Pretrain), 128K tokens (Instruct) |
What is Tencent-Hunyuan-Large?
Tencent-Hunyuan-Large is currently the largest open-source Transformer-based Mixture of Experts (MoE) model in the industry. It represents a significant advancement in efficient large language model design, utilizing innovative architecture to achieve superior performance while optimizing computational resources.
Implementation Details
The model employs several cutting-edge techniques to achieve its impressive performance:
- Sophisticated KV Cache Compression using Grouped Query Attention (GQA) and Cross-Layer Attention (CLA)
- Expert-Specific Learning Rate Scaling for optimized training
- High-quality synthetic data enhancement for improved generalization
- Advanced long-context processing capabilities
Core Capabilities
- Exceptional performance on MMLU (89.9% for instruct version)
- Superior mathematical reasoning (77.4% on MATH dataset)
- Strong multilingual capabilities, particularly in Chinese language tasks
- Robust performance in commonsense reasoning and knowledge-based tasks
Frequently Asked Questions
Q: What makes this model unique?
The model's MoE architecture with 389B total parameters but only 52B active parameters makes it highly efficient while maintaining SOTA performance. Its innovative KV cache compression and expert-specific learning techniques set it apart from traditional models.
Q: What are the recommended use cases?
The model excels in diverse applications including complex reasoning, mathematical problem-solving, multilingual tasks, and long-context processing. It's particularly strong in academic and knowledge-intensive applications.