Randeng-T5-77M
Property | Value |
---|---|
Parameter Count | 77M |
Model Type | T5-based NLT Model |
Author | IDEA-CCNL |
Training Data | WuDao Corpora (180GB) |
GitHub | Fengshenbang-LM |
What is Randeng-T5-77M?
Randeng-T5-77M is a specialized Chinese version of mT5-small, specifically designed for Natural Language Translation (NLT) tasks. This model represents a significant adaptation of the original mT5 architecture, optimized for Chinese language processing through innovative training approaches and architectural modifications.
Implementation Details
The model implements several key technical innovations in its training process. It utilizes Corpus-Adaptive Pre-Training (CAPT) technology on the extensive WuDao Corpora (180GB version). To optimize training efficiency, the team specifically retained only the Chinese and English vocabulary from the original T5 tokenizer (sentence piece), significantly streamlining the model's focus.
- Training Infrastructure: 8 A100 GPUs for approximately 24 hours
- Pre-training Objective: Span corruption
- Framework: Fengshen framework
- Vocabulary: Optimized for Chinese-English
Core Capabilities
- Specialized Chinese language processing
- Efficient natural language translation tasks
- Optimized performance for Chinese-English applications
- Streamlined vocabulary for improved processing speed
Frequently Asked Questions
Q: What makes this model unique?
The model's unique value lies in its specialized optimization for Chinese language processing, combined with its efficient implementation using CAPT technology and focused vocabulary selection.
Q: What are the recommended use cases?
Randeng-T5-77M is particularly well-suited for Chinese natural language translation tasks, text generation, and other NLP applications requiring strong Chinese language understanding capabilities.