CPM-Generate

Property	Value
Parameters	2.6 Billion
License	MIT
Training Data	100GB Chinese corpus
Paper	View Research Paper

What is CPM-Generate?

CPM-Generate is a state-of-the-art Chinese language model developed by TsinghuaAI, representing one of the largest Chinese pre-trained language models available. Built on the Transformer architecture, it leverages 2.6 billion parameters trained on a diverse 100GB corpus of Chinese text, including encyclopedia entries, webpages, stories, news, and dialogues.

Implementation Details

The model utilizes a dense attention mechanism with a maximum sequence length of 1,024 tokens. Training was conducted over 20,000 steps using 64 NVIDIA V100 GPUs, with the first 5,000 steps dedicated to warm-up. The model employs the Adam optimizer with a learning rate of 1.5×10^-4 and a batch size of 3,072.

Architecture: Transformer-based autoregressive language model
Training Data Distribution: Encyclopedia (40GB), Webpage (39GB), Story (10GB), News (10GB), Dialog (1GB)
Available Variants: Small (109M params), Medium (334M params), Large (2.6B params)

Core Capabilities

Text Generation and Completion
Zero-shot Text Classification
Chinese Idiom Cloze Tests
Conversational Response Generation
Few-shot Learning Tasks

Frequently Asked Questions

Q: What makes this model unique?

CPM-Generate stands out for being one of the largest Chinese language models, demonstrating superior performance in zero-shot and few-shot learning scenarios across various NLP tasks. Its comprehensive training data spanning multiple domains makes it particularly versatile for Chinese language processing tasks.

Q: What are the recommended use cases?

The model excels in text generation, conversation systems, essay writing, cloze tests, and language understanding tasks. It's particularly effective for applications requiring few-shot learning capabilities in Chinese language processing.

CPM-Generate

CPM-Generate

What is CPM-Generate?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models