longformer-base-plagiarism-detection

jpwahle

A specialized Longformer model fine-tuned for plagiarism detection, particularly effective against machine-paraphrased text with 149M parameters and 80.99% F1 score.

Property	Value
Parameter Count	149M
Model Type	Text Classification
Paper	Longformer: The Long-Document Transformer
Author	jpwahle
Performance	80.99% average F1 score

What is longformer-base-plagiarism-detection?

This is a specialized implementation of the Longformer architecture specifically trained for detecting machine-paraphrased plagiarism. The model has been fine-tuned on the Machine-Paraphrased Plagiarism Dataset and shows remarkable performance in identifying artificially modified text, outperforming human evaluators in many cases.

Implementation Details

Built on the Longformer-base-4096 architecture, this model leverages advanced transformer technology optimized for processing long documents. It utilizes PyTorch and implements the Safetensors format for improved security and performance.

Achieves 99.68% F1 score for SpinBot detection
71.64% F1 score for SpinnerChief cases
Outperforms human evaluators (who achieved 78.4% and 65.6% respectively)
Implements efficient attention mechanisms for long document processing

Core Capabilities

Machine-paraphrased text detection
Long document processing capability
Integration with popular plagiarism detection workflows
Superior performance compared to traditional text-matching systems

Frequently Asked Questions

Q: What makes this model unique?

This model specifically addresses the growing concern of machine-paraphrased plagiarism, offering superior detection capabilities compared to both human evaluators and traditional plagiarism detection tools like Turnitin and PlagScan.

Q: What are the recommended use cases?

The model is particularly suited for academic integrity applications, including checking research papers, graduation theses, and Wikipedia articles for machine-paraphrased content. It's especially effective against content modified by tools like SpinBot and SpinnerChief.