longformer_zh

Maintained By
ValkyriaLenneth

Longformer_zh

PropertyValue
AuthorValkyriaLenneth
Model TypeLongformer (Chinese)
Maximum Sequence Length4096 tokens
Training Hardware4 x Titan RTX
Model URLhuggingface.co/ValkyriaLenneth/longformer_zh

What is longformer_zh?

Longformer_zh is a Chinese language model designed to process long documents efficiently with linear complexity O(n), compared to the quadratic complexity O(n²) of traditional transformers. Built upon RoBERTa-zh, it introduces an innovative attention mechanism combining local windowed attention with task-specific global attention, enabling processing of sequences up to 4096 tokens.

Implementation Details

The model was pretrained using a mixture of Chinese corpora from nlp_chinese_corpus, incorporating Whole-Word-Masking (WWM) specifically optimized for Chinese language characteristics. The training process took approximately 4 days on 4 Titan RTX GPUs, utilizing Nvidia Apex for mixed-precision training acceleration.

  • Based on RoBERTa-zh architecture with modified attention mechanism
  • Implements Whole-Word-Masking for better Chinese language understanding
  • Utilizes Jieba tokenizer and JIONLP for data preprocessing
  • Achieves BPC (bits-per-character) score of 3.10 after training

Core Capabilities

  • Sentiment Analysis: Achieves 80.51 F1 score on CCF-Sentiment-Analysis
  • Machine Reading Comprehension: 86.15 F1 and 66.84 EM scores
  • Coreference Resolution: 67.81 Conll-F1 score
  • Efficient processing of long documents (4K+ tokens)

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient linear-complexity attention mechanisms with Chinese language optimization through Whole-Word-Masking, making it particularly suitable for long Chinese document processing tasks while maintaining competitive performance with BERT and RoBERTa variants.

Q: What are the recommended use cases?

The model is particularly well-suited for tasks involving long Chinese documents, including document classification, sentiment analysis, machine reading comprehension, and coreference resolution. It's especially valuable when dealing with documents exceeding traditional transformer length limitations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.