t5-v1_1-small-chinese-cluecorpussmall

t5-v1_1-small-chinese-cluecorpussmall

uer

Chinese T5 v1.1 small model trained on CLUECorpusSmall dataset, optimized for text-to-text generation tasks with GEGLU activation and improved architecture (8 layers, 512 hidden size).

PropertyValue
ArchitectureT5 Version 1.1
SizeSmall (8 layers, 512 hidden size)
Training DataCLUECorpusSmall
PaperUER Paper
AuthorUER

What is t5-v1_1-small-chinese-cluecorpussmall?

This is a Chinese language T5 (Text-to-Text Transfer Transformer) model that represents version 1.1 of the architecture, pre-trained on the CLUECorpusSmall dataset. It's specifically designed for Chinese text generation tasks and includes several improvements over the original T5 model, including GEGLU activation in feed-forward layers and no parameter sharing between embedding and classifier layers.

Implementation Details

The model was trained using UER-py framework in two stages: 1,000,000 steps with 128 sequence length, followed by 250,000 additional steps with 512 sequence length. It utilizes span masking with geometric probability of 0.3 and maximum span length of 5.

  • Improved architecture with GEGLU activation
  • Dropout disabled during pre-training
  • Independent embedding and classifier layer parameters
  • 8 layers with 512 hidden size (Small configuration)

Core Capabilities

  • Text-to-Text Generation for Chinese language
  • Supports span masking with sentinel tokens
  • Efficient processing with smaller parameter count
  • Optimized for Chinese language understanding and generation

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its optimized architecture for Chinese language processing, incorporating T5 v1.1 improvements while maintaining a smaller parameter count for efficiency. It uses sentinel tokens for masking and includes specialized GEGLU activation.

Q: What are the recommended use cases?

The model is best suited for Chinese text generation tasks, including text completion, summarization, and other text-to-text transformation tasks. It's particularly efficient for applications requiring a balance between performance and computational resources.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026