Randeng-DELLA-CVAE-226M-NER-Chinese
Property | Value |
---|---|
Parameter Count | 226M |
Model Type | Conditional Variational Autoencoder (CVAE) |
Architecture | GPT-2 based encoder-decoder |
Research Paper | Link to Paper |
Training Data | Wudao dataset + NER fine-tuning |
What is Randeng-DELLA-CVAE-226M-NER-Chinese?
This is a specialized Chinese language model that combines deep variational autoencoding with controlled text generation capabilities. Initially pretrained on the comprehensive Wudao dataset and subsequently fine-tuned for Named Entity Recognition (NER) tasks, it excels at generating contextually appropriate sentences containing specified named entities and their types.
Implementation Details
The model implements a unique architecture where both encoder and decoder utilize GPT-2 components. Unlike the original DELLA paper implementation, it employs a simplified approach to information fusion, using linear transformation and element-wise addition instead of low-rank-tensor-product, which has proven more stable for open-domain pretraining.
- Layer-wise recurrent latent variables structure
- Modified information fusion mechanism for improved stability
- Specialized tokenization with support for entity markers
- 226 million trainable parameters
Core Capabilities
- Generate coherent Chinese text containing specified named entities
- Control generation through entity type specifications
- Handle multiple entity constraints simultaneously
- Support for various entity types including locations and temporal expressions
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its ability to generate Chinese text while maintaining control over the inclusion of specific named entities. Its modified architecture provides better stability for open-domain applications while preserving the benefits of variational modeling.
Q: What are the recommended use cases?
The model is ideal for applications requiring controlled text generation in Chinese, such as automated content creation with specific entity requirements, data augmentation for NER tasks, and generating context-rich examples for language learning or testing.