t5-v1_1-small-chinese-cluecorpussmall

uer

Chinese T5 v1.1 small model trained on CLUECorpusSmall dataset, optimized for text-to-text generation tasks with GEGLU activation and improved architecture (8 layers, 512 hidden size).

Property	Value
Architecture	T5 Version 1.1
Size	Small (8 layers, 512 hidden size)
Training Data	CLUECorpusSmall
Paper	UER Paper
Author	UER

What is t5-v1_1-small-chinese-cluecorpussmall?

This is a Chinese language T5 (Text-to-Text Transfer Transformer) model that represents version 1.1 of the architecture, pre-trained on the CLUECorpusSmall dataset. It's specifically designed for Chinese text generation tasks and includes several improvements over the original T5 model, including GEGLU activation in feed-forward layers and no parameter sharing between embedding and classifier layers.

Implementation Details

The model was trained using UER-py framework in two stages: 1,000,000 steps with 128 sequence length, followed by 250,000 additional steps with 512 sequence length. It utilizes span masking with geometric probability of 0.3 and maximum span length of 5.

Improved architecture with GEGLU activation
Dropout disabled during pre-training
Independent embedding and classifier layer parameters
8 layers with 512 hidden size (Small configuration)

Core Capabilities

Text-to-Text Generation for Chinese language
Supports span masking with sentinel tokens
Efficient processing with smaller parameter count
Optimized for Chinese language understanding and generation

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its optimized architecture for Chinese language processing, incorporating T5 v1.1 improvements while maintaining a smaller parameter count for efficiency. It uses sentinel tokens for masking and includes specialized GEGLU activation.

Q: What are the recommended use cases?

The model is best suited for Chinese text generation tasks, including text completion, summarization, and other text-to-text transformation tasks. It's particularly efficient for applications requiring a balance between performance and computational resources.