longformer_zh

longformer_zh

ValkyriaLenneth

Chinese Longformer model optimized for long document processing (4K+ tokens) with linear complexity. Features whole-word masking and achieves comparable performance to BERT/RoBERTa on various NLP tasks.

PropertyValue
AuthorValkyriaLenneth
Model TypeLongformer (Chinese)
Maximum Sequence Length4096 tokens
Training Hardware4 x Titan RTX
Model URLhuggingface.co/ValkyriaLenneth/longformer_zh

What is longformer_zh?

Longformer_zh is a Chinese language model designed to process long documents efficiently with linear complexity O(n), compared to the quadratic complexity O(n²) of traditional transformers. Built upon RoBERTa-zh, it introduces an innovative attention mechanism combining local windowed attention with task-specific global attention, enabling processing of sequences up to 4096 tokens.

Implementation Details

The model was pretrained using a mixture of Chinese corpora from nlp_chinese_corpus, incorporating Whole-Word-Masking (WWM) specifically optimized for Chinese language characteristics. The training process took approximately 4 days on 4 Titan RTX GPUs, utilizing Nvidia Apex for mixed-precision training acceleration.

  • Based on RoBERTa-zh architecture with modified attention mechanism
  • Implements Whole-Word-Masking for better Chinese language understanding
  • Utilizes Jieba tokenizer and JIONLP for data preprocessing
  • Achieves BPC (bits-per-character) score of 3.10 after training

Core Capabilities

  • Sentiment Analysis: Achieves 80.51 F1 score on CCF-Sentiment-Analysis
  • Machine Reading Comprehension: 86.15 F1 and 66.84 EM scores
  • Coreference Resolution: 67.81 Conll-F1 score
  • Efficient processing of long documents (4K+ tokens)

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient linear-complexity attention mechanisms with Chinese language optimization through Whole-Word-Masking, making it particularly suitable for long Chinese document processing tasks while maintaining competitive performance with BERT and RoBERTa variants.

Q: What are the recommended use cases?

The model is particularly well-suited for tasks involving long Chinese documents, including document classification, sentiment analysis, machine reading comprehension, and coreference resolution. It's especially valuable when dealing with documents exceeding traditional transformer length limitations.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026