gpt2-distil-chinese-cluecorpussmall

Maintained By
uer

gpt2-distil-chinese-cluecorpussmall

PropertyValue
ArchitectureGPT-2 Distilled (6 layers, 768 hidden)
Training DataCLUECorpusSmall
DeveloperUER
FrameworkUER-py
LicenseOpen Source

What is gpt2-distil-chinese-cluecorpussmall?

This is a distilled version of GPT-2 specifically trained for Chinese language generation. It features 6 layers and 768 hidden dimensions, following the configuration of distilgpt2 but trained independently on Chinese text. The model was developed using UER-py framework and trained on the CLUECorpusSmall dataset.

Implementation Details

The model underwent a two-stage training process: initially trained for 1,000,000 steps with sequence length 128, followed by 250,000 additional steps with sequence length 1024. It uses a learning rate of 1e-4 in the first stage and 5e-5 in the second stage, with batch sizes of 64 and 16 respectively.

  • Utilizes BertTokenizer for text tokenization
  • Implements GPT2LMHeadModel architecture
  • Supports text generation via TextGenerationPipeline
  • Optimized for Chinese language understanding and generation

Core Capabilities

  • Chinese text generation with controllable length
  • Context-aware text completion
  • Efficient inference with reduced parameter count
  • Support for both short and long-form text generation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for being a distilled version of GPT-2 specifically optimized for Chinese language, offering a good balance between model size and performance. Unlike traditional distillation, it was trained directly on Chinese data without knowledge distillation from a larger model.

Q: What are the recommended use cases?

The model is best suited for Chinese text generation tasks, including creative writing, content completion, and text continuation. It's particularly efficient for applications requiring a balance between computational resources and generation quality.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.