Longformer_zh
Property | Value |
---|---|
Author | ValkyriaLenneth |
Model Type | Longformer (Chinese) |
Maximum Sequence Length | 4096 tokens |
Training Hardware | 4 x Titan RTX |
Model URL | huggingface.co/ValkyriaLenneth/longformer_zh |
What is longformer_zh?
Longformer_zh is a Chinese language model designed to process long documents efficiently with linear complexity O(n), compared to the quadratic complexity O(n²) of traditional transformers. Built upon RoBERTa-zh, it introduces an innovative attention mechanism combining local windowed attention with task-specific global attention, enabling processing of sequences up to 4096 tokens.
Implementation Details
The model was pretrained using a mixture of Chinese corpora from nlp_chinese_corpus, incorporating Whole-Word-Masking (WWM) specifically optimized for Chinese language characteristics. The training process took approximately 4 days on 4 Titan RTX GPUs, utilizing Nvidia Apex for mixed-precision training acceleration.
- Based on RoBERTa-zh architecture with modified attention mechanism
- Implements Whole-Word-Masking for better Chinese language understanding
- Utilizes Jieba tokenizer and JIONLP for data preprocessing
- Achieves BPC (bits-per-character) score of 3.10 after training
Core Capabilities
- Sentiment Analysis: Achieves 80.51 F1 score on CCF-Sentiment-Analysis
- Machine Reading Comprehension: 86.15 F1 and 66.84 EM scores
- Coreference Resolution: 67.81 Conll-F1 score
- Efficient processing of long documents (4K+ tokens)
Frequently Asked Questions
Q: What makes this model unique?
The model combines efficient linear-complexity attention mechanisms with Chinese language optimization through Whole-Word-Masking, making it particularly suitable for long Chinese document processing tasks while maintaining competitive performance with BERT and RoBERTa variants.
Q: What are the recommended use cases?
The model is particularly well-suited for tasks involving long Chinese documents, including document classification, sentiment analysis, machine reading comprehension, and coreference resolution. It's especially valuable when dealing with documents exceeding traditional transformer length limitations.