Ziya-LLaMA-13B-v1

Maintained By
IDEA-CCNL

Ziya-LLaMA-13B-v1

PropertyValue
Parameter Count13 Billion
Model TypeLarge Language Model
ArchitectureLLaMA-based
LicenseLLaMA License (Non-commercial)
LanguagesEnglish & Chinese

What is Ziya-LLaMA-13B-v1?

Ziya-LLaMA-13B-v1 is a sophisticated large language model that builds upon the LLaMA architecture with enhanced multilingual capabilities, particularly for Chinese language processing. The model underwent extensive training on 125B tokens of diverse data, including content from OpenWebText, Books, Wikipedia, and specialized Chinese datasets.

Implementation Details

The model features a custom vocabulary of 39,410 tokens, including 7,000+ Chinese characters, and was trained using 160 A100 GPUs. The training process involved three stages: large-scale pre-training, supervised fine-tuning, and human feedback learning. The model achieves 118 TFLOP per GPU per second throughput during training.

  • Advanced tokenizer optimization for Chinese language
  • Curriculum learning approach for supervised fine-tuning
  • Comprehensive human feedback training using RM and PPO
  • Distributed training across 160 40GB A100 GPUs

Core Capabilities

  • Translation between Chinese and English
  • Programming and code generation
  • Text classification and information extraction
  • Summary generation
  • Mathematical computation
  • Common sense question answering
  • Copywriting and content generation

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its optimized Chinese language processing capabilities while maintaining strong performance in English. It uses a custom vocabulary and underwent extensive training specifically designed for bilingual applications.

Q: What are the recommended use cases?

The model excels in bilingual applications, particularly those requiring Chinese-English translation, programming tasks, text analysis, and creative content generation. However, due to LLaMA licensing restrictions, it cannot be used for commercial purposes.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.