yayi2-30b

wenge-research

YAYI2-30B is a powerful 30B parameter multilingual LLM trained on 2.65T tokens, featuring strong performance across knowledge, math, and code tasks with 80.5% MMLU score.

Property	Value
Parameter Count	30 Billion
Architecture	Transformer (64 layers, 64 heads)
Context Length	4096 tokens
Training Data	2.65T tokens (multilingual)
License	Apache-2.0 (code) / Custom (model)
Paper	arXiv:2312.14862

What is YAYI2-30B?

YAYI2-30B is a state-of-the-art large language model developed by Wenge Technology, representing a significant advancement in multilingual AI capabilities. The model is built on a sophisticated Transformer architecture and has been pretrained on an extensive dataset of 2.65 trillion tokens across multiple languages.

Implementation Details

The model features a robust architecture with 64 transformer layers, 64 attention heads, and a hidden size of 7168. It uses a vocabulary size of 81,920 tokens and supports a context length of 4096 tokens. Implementation requires significant computational resources, with a minimum of 80GB GPU memory for inference.

Advanced transformer architecture optimized for multilingual processing
Comprehensive training across diverse datasets
Supports both base and chat versions
Implements human feedback reinforcement learning for better alignment

Core Capabilities

Strong performance on knowledge benchmarks (80.5% on MMLU)
Excellence in mathematical reasoning (71.2% on GSM8K)
Superior code generation capabilities (53.1% on HumanEval)
Robust multilingual understanding and generation
Advanced logical reasoning and problem-solving abilities

Frequently Asked Questions

Q: What makes this model unique?

YAYI2-30B stands out for its exceptional performance across multiple benchmarks, particularly in knowledge testing and mathematical reasoning. It achieves state-of-the-art results among similar-sized models, especially in MMLU (80.5%) and CMMLU (84.0%).

Q: What are the recommended use cases?

The model is well-suited for a wide range of applications including multilingual text generation, mathematical problem-solving, code generation, and complex reasoning tasks. It's particularly effective for scenarios requiring deep knowledge understanding and logical reasoning.