CPM-Generate
Property | Value |
---|---|
Parameters | 2.6 Billion |
License | MIT |
Training Data | 100GB Chinese corpus |
Paper | View Research Paper |
What is CPM-Generate?
CPM-Generate is a state-of-the-art Chinese language model developed by TsinghuaAI, representing one of the largest Chinese pre-trained language models available. Built on the Transformer architecture, it leverages 2.6 billion parameters trained on a diverse 100GB corpus of Chinese text, including encyclopedia entries, webpages, stories, news, and dialogues.
Implementation Details
The model utilizes a dense attention mechanism with a maximum sequence length of 1,024 tokens. Training was conducted over 20,000 steps using 64 NVIDIA V100 GPUs, with the first 5,000 steps dedicated to warm-up. The model employs the Adam optimizer with a learning rate of 1.5×10^-4 and a batch size of 3,072.
- Architecture: Transformer-based autoregressive language model
- Training Data Distribution: Encyclopedia (40GB), Webpage (39GB), Story (10GB), News (10GB), Dialog (1GB)
- Available Variants: Small (109M params), Medium (334M params), Large (2.6B params)
Core Capabilities
- Text Generation and Completion
- Zero-shot Text Classification
- Chinese Idiom Cloze Tests
- Conversational Response Generation
- Few-shot Learning Tasks
Frequently Asked Questions
Q: What makes this model unique?
CPM-Generate stands out for being one of the largest Chinese language models, demonstrating superior performance in zero-shot and few-shot learning scenarios across various NLP tasks. Its comprehensive training data spanning multiple domains makes it particularly versatile for Chinese language processing tasks.
Q: What are the recommended use cases?
The model excels in text generation, conversation systems, essay writing, cloze tests, and language understanding tasks. It's particularly effective for applications requiring few-shot learning capabilities in Chinese language processing.