Randeng-DELLA-CVAE-226M-NER-Chinese

Property	Value
Parameter Count	226M
Model Type	Conditional Variational Autoencoder (CVAE)
Architecture	GPT-2 based encoder-decoder
Research Paper	Link to Paper
Training Data	Wudao dataset + NER fine-tuning

What is Randeng-DELLA-CVAE-226M-NER-Chinese?

This is a specialized Chinese language model that combines deep variational autoencoding with controlled text generation capabilities. Initially pretrained on the comprehensive Wudao dataset and subsequently fine-tuned for Named Entity Recognition (NER) tasks, it excels at generating contextually appropriate sentences containing specified named entities and their types.

Implementation Details

The model implements a unique architecture where both encoder and decoder utilize GPT-2 components. Unlike the original DELLA paper implementation, it employs a simplified approach to information fusion, using linear transformation and element-wise addition instead of low-rank-tensor-product, which has proven more stable for open-domain pretraining.

Layer-wise recurrent latent variables structure
Modified information fusion mechanism for improved stability
Specialized tokenization with support for entity markers
226 million trainable parameters

Core Capabilities

Generate coherent Chinese text containing specified named entities
Control generation through entity type specifications
Handle multiple entity constraints simultaneously
Support for various entity types including locations and temporal expressions

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to generate Chinese text while maintaining control over the inclusion of specific named entities. Its modified architecture provides better stability for open-domain applications while preserving the benefits of variational modeling.

Q: What are the recommended use cases?

The model is ideal for applications requiring controlled text generation in Chinese, such as automated content creation with specific entity requirements, data augmentation for NER tasks, and generating context-rich examples for language learning or testing.