Longformer-base-plagiarism-detection
Property | Value |
---|---|
Parameter Count | 149M |
Model Type | Text Classification |
Paper | Longformer: The Long-Document Transformer |
Author | jpwahle |
Performance | 80.99% average F1 score |
What is longformer-base-plagiarism-detection?
This is a specialized implementation of the Longformer architecture specifically trained for detecting machine-paraphrased plagiarism. The model has been fine-tuned on the Machine-Paraphrased Plagiarism Dataset and shows remarkable performance in identifying artificially modified text, outperforming human evaluators in many cases.
Implementation Details
Built on the Longformer-base-4096 architecture, this model leverages advanced transformer technology optimized for processing long documents. It utilizes PyTorch and implements the Safetensors format for improved security and performance.
- Achieves 99.68% F1 score for SpinBot detection
- 71.64% F1 score for SpinnerChief cases
- Outperforms human evaluators (who achieved 78.4% and 65.6% respectively)
- Implements efficient attention mechanisms for long document processing
Core Capabilities
- Machine-paraphrased text detection
- Long document processing capability
- Integration with popular plagiarism detection workflows
- Superior performance compared to traditional text-matching systems
Frequently Asked Questions
Q: What makes this model unique?
This model specifically addresses the growing concern of machine-paraphrased plagiarism, offering superior detection capabilities compared to both human evaluators and traditional plagiarism detection tools like Turnitin and PlagScan.
Q: What are the recommended use cases?
The model is particularly suited for academic integrity applications, including checking research papers, graduation theses, and Wikipedia articles for machine-paraphrased content. It's especially effective against content modified by tools like SpinBot and SpinnerChief.