YAYI2-30B
Property | Value |
---|---|
Parameter Count | 30 Billion |
Architecture | Transformer (64 layers, 64 heads) |
Context Length | 4096 tokens |
Training Data | 2.65T tokens (multilingual) |
License | Apache-2.0 (code) / Custom (model) |
Paper | arXiv:2312.14862 |
What is YAYI2-30B?
YAYI2-30B is a state-of-the-art large language model developed by Wenge Technology, representing a significant advancement in multilingual AI capabilities. The model is built on a sophisticated Transformer architecture and has been pretrained on an extensive dataset of 2.65 trillion tokens across multiple languages.
Implementation Details
The model features a robust architecture with 64 transformer layers, 64 attention heads, and a hidden size of 7168. It uses a vocabulary size of 81,920 tokens and supports a context length of 4096 tokens. Implementation requires significant computational resources, with a minimum of 80GB GPU memory for inference.
- Advanced transformer architecture optimized for multilingual processing
- Comprehensive training across diverse datasets
- Supports both base and chat versions
- Implements human feedback reinforcement learning for better alignment
Core Capabilities
- Strong performance on knowledge benchmarks (80.5% on MMLU)
- Excellence in mathematical reasoning (71.2% on GSM8K)
- Superior code generation capabilities (53.1% on HumanEval)
- Robust multilingual understanding and generation
- Advanced logical reasoning and problem-solving abilities
Frequently Asked Questions
Q: What makes this model unique?
YAYI2-30B stands out for its exceptional performance across multiple benchmarks, particularly in knowledge testing and mathematical reasoning. It achieves state-of-the-art results among similar-sized models, especially in MMLU (80.5%) and CMMLU (84.0%).
Q: What are the recommended use cases?
The model is well-suited for a wide range of applications including multilingual text generation, mathematical problem-solving, code generation, and complex reasoning tasks. It's particularly effective for scenarios requiring deep knowledge understanding and logical reasoning.