gpt2-distil-chinese-cluecorpussmall
Property | Value |
---|---|
Architecture | GPT-2 Distilled (6 layers, 768 hidden) |
Training Data | CLUECorpusSmall |
Developer | UER |
Framework | UER-py |
License | Open Source |
What is gpt2-distil-chinese-cluecorpussmall?
This is a distilled version of GPT-2 specifically trained for Chinese language generation. It features 6 layers and 768 hidden dimensions, following the configuration of distilgpt2 but trained independently on Chinese text. The model was developed using UER-py framework and trained on the CLUECorpusSmall dataset.
Implementation Details
The model underwent a two-stage training process: initially trained for 1,000,000 steps with sequence length 128, followed by 250,000 additional steps with sequence length 1024. It uses a learning rate of 1e-4 in the first stage and 5e-5 in the second stage, with batch sizes of 64 and 16 respectively.
- Utilizes BertTokenizer for text tokenization
- Implements GPT2LMHeadModel architecture
- Supports text generation via TextGenerationPipeline
- Optimized for Chinese language understanding and generation
Core Capabilities
- Chinese text generation with controllable length
- Context-aware text completion
- Efficient inference with reduced parameter count
- Support for both short and long-form text generation
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for being a distilled version of GPT-2 specifically optimized for Chinese language, offering a good balance between model size and performance. Unlike traditional distillation, it was trained directly on Chinese data without knowledge distillation from a larger model.
Q: What are the recommended use cases?
The model is best suited for Chinese text generation tasks, including creative writing, content completion, and text continuation. It's particularly efficient for applications requiring a balance between computational resources and generation quality.