LongLM-large
Property | Value |
---|---|
Parameter Count | 993M |
Model Type | Text-to-Text Generation |
Architecture | T5-based Transformer |
Paper | arXiv:2108.12960 |
Tensor Type | FP16 |
What is LongLM-large?
LongLM-large is a sophisticated Chinese language model developed by thu-coai, specifically designed for long-text understanding and generation. With 993M parameters, it utilizes a T5-based architecture featuring 1,536 dimensional hidden states and 12 attention heads.
Implementation Details
The model employs a unique architecture with 24 encoder layers and 32 decoder layers, using a feed-forward dimension of 3,072 and key/value dimension of 64. It's trained on 120GB of novel data using two primary pretraining tasks: text infilling and conditional continuation.
- Advanced text infilling with Poisson distribution (λ=3) for masked span lengths
- 15% masking ratio for original texts
- Conditional continuation through random text splitting
- Implementation in PyTorch with Hugging Face Transformers support
Core Capabilities
- Long-form Chinese text generation
- Text completion and infilling
- Conditional text generation
- Story continuation and narrative generation
Frequently Asked Questions
Q: What makes this model unique?
LongLM-large stands out for its specialized architecture optimized for long-text processing, combining both text infilling and conditional continuation in its pretraining objectives. Its large parameter count and specialized training on novel data make it particularly effective for creative text generation in Chinese.
Q: What are the recommended use cases?
The model is best suited for applications requiring long-form Chinese text generation, such as story continuation, creative writing assistance, and document completion. It's particularly effective for tasks requiring understanding and maintaining context over longer sequences.