Longformer_zh

Property	Value
Author	ValkyriaLenneth
Model Type	Longformer (Chinese)
Maximum Sequence Length	4096 tokens
Training Hardware	4 x Titan RTX
Model URL	huggingface.co/ValkyriaLenneth/longformer_zh

What is longformer_zh?

Longformer_zh is a Chinese language model designed to process long documents efficiently with linear complexity O(n), compared to the quadratic complexity O(n²) of traditional transformers. Built upon RoBERTa-zh, it introduces an innovative attention mechanism combining local windowed attention with task-specific global attention, enabling processing of sequences up to 4096 tokens.

Implementation Details

The model was pretrained using a mixture of Chinese corpora from nlp_chinese_corpus, incorporating Whole-Word-Masking (WWM) specifically optimized for Chinese language characteristics. The training process took approximately 4 days on 4 Titan RTX GPUs, utilizing Nvidia Apex for mixed-precision training acceleration.

Based on RoBERTa-zh architecture with modified attention mechanism
Implements Whole-Word-Masking for better Chinese language understanding
Utilizes Jieba tokenizer and JIONLP for data preprocessing
Achieves BPC (bits-per-character) score of 3.10 after training

Core Capabilities

Sentiment Analysis: Achieves 80.51 F1 score on CCF-Sentiment-Analysis
Machine Reading Comprehension: 86.15 F1 and 66.84 EM scores
Coreference Resolution: 67.81 Conll-F1 score
Efficient processing of long documents (4K+ tokens)

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient linear-complexity attention mechanisms with Chinese language optimization through Whole-Word-Masking, making it particularly suitable for long Chinese document processing tasks while maintaining competitive performance with BERT and RoBERTa variants.

Q: What are the recommended use cases?

The model is particularly well-suited for tasks involving long Chinese documents, including document classification, sentiment analysis, machine reading comprehension, and coreference resolution. It's especially valuable when dealing with documents exceeding traditional transformer length limitations.

longformer_zh