LongLM-large

Property	Value
Parameter Count	993M
Model Type	Text-to-Text Generation
Architecture	T5-based Transformer
Paper	arXiv:2108.12960
Tensor Type	FP16

What is LongLM-large?

LongLM-large is a sophisticated Chinese language model developed by thu-coai, specifically designed for long-text understanding and generation. With 993M parameters, it utilizes a T5-based architecture featuring 1,536 dimensional hidden states and 12 attention heads.

Implementation Details

The model employs a unique architecture with 24 encoder layers and 32 decoder layers, using a feed-forward dimension of 3,072 and key/value dimension of 64. It's trained on 120GB of novel data using two primary pretraining tasks: text infilling and conditional continuation.

Advanced text infilling with Poisson distribution (λ=3) for masked span lengths
15% masking ratio for original texts
Conditional continuation through random text splitting
Implementation in PyTorch with Hugging Face Transformers support

Core Capabilities

Long-form Chinese text generation
Text completion and infilling
Conditional text generation
Story continuation and narrative generation

Frequently Asked Questions

Q: What makes this model unique?

LongLM-large stands out for its specialized architecture optimized for long-text processing, combining both text infilling and conditional continuation in its pretraining objectives. Its large parameter count and specialized training on novel data make it particularly effective for creative text generation in Chinese.

Q: What are the recommended use cases?

The model is best suited for applications requiring long-form Chinese text generation, such as story continuation, creative writing assistance, and document completion. It's particularly effective for tasks requiring understanding and maintaining context over longer sequences.

LongLM-large

LongLM-large

What is LongLM-large?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models