Ziya-LLaMA-13B-v1
Property | Value |
---|---|
Parameter Count | 13 Billion |
Model Type | Large Language Model |
Architecture | LLaMA-based |
License | LLaMA License (Non-commercial) |
Languages | English & Chinese |
What is Ziya-LLaMA-13B-v1?
Ziya-LLaMA-13B-v1 is a sophisticated large language model that builds upon the LLaMA architecture with enhanced multilingual capabilities, particularly for Chinese language processing. The model underwent extensive training on 125B tokens of diverse data, including content from OpenWebText, Books, Wikipedia, and specialized Chinese datasets.
Implementation Details
The model features a custom vocabulary of 39,410 tokens, including 7,000+ Chinese characters, and was trained using 160 A100 GPUs. The training process involved three stages: large-scale pre-training, supervised fine-tuning, and human feedback learning. The model achieves 118 TFLOP per GPU per second throughput during training.
- Advanced tokenizer optimization for Chinese language
- Curriculum learning approach for supervised fine-tuning
- Comprehensive human feedback training using RM and PPO
- Distributed training across 160 40GB A100 GPUs
Core Capabilities
- Translation between Chinese and English
- Programming and code generation
- Text classification and information extraction
- Summary generation
- Mathematical computation
- Common sense question answering
- Copywriting and content generation
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its optimized Chinese language processing capabilities while maintaining strong performance in English. It uses a custom vocabulary and underwent extensive training specifically designed for bilingual applications.
Q: What are the recommended use cases?
The model excels in bilingual applications, particularly those requiring Chinese-English translation, programming tasks, text analysis, and creative content generation. However, due to LLaMA licensing restrictions, it cannot be used for commercial purposes.