longformer-base-plagiarism-detection

longformer-base-plagiarism-detection

jpwahle

A specialized Longformer model fine-tuned for plagiarism detection, particularly effective against machine-paraphrased text with 149M parameters and 80.99% F1 score.

PropertyValue
Parameter Count149M
Model TypeText Classification
PaperLongformer: The Long-Document Transformer
Authorjpwahle
Performance80.99% average F1 score

What is longformer-base-plagiarism-detection?

This is a specialized implementation of the Longformer architecture specifically trained for detecting machine-paraphrased plagiarism. The model has been fine-tuned on the Machine-Paraphrased Plagiarism Dataset and shows remarkable performance in identifying artificially modified text, outperforming human evaluators in many cases.

Implementation Details

Built on the Longformer-base-4096 architecture, this model leverages advanced transformer technology optimized for processing long documents. It utilizes PyTorch and implements the Safetensors format for improved security and performance.

  • Achieves 99.68% F1 score for SpinBot detection
  • 71.64% F1 score for SpinnerChief cases
  • Outperforms human evaluators (who achieved 78.4% and 65.6% respectively)
  • Implements efficient attention mechanisms for long document processing

Core Capabilities

  • Machine-paraphrased text detection
  • Long document processing capability
  • Integration with popular plagiarism detection workflows
  • Superior performance compared to traditional text-matching systems

Frequently Asked Questions

Q: What makes this model unique?

This model specifically addresses the growing concern of machine-paraphrased plagiarism, offering superior detection capabilities compared to both human evaluators and traditional plagiarism detection tools like Turnitin and PlagScan.

Q: What are the recommended use cases?

The model is particularly suited for academic integrity applications, including checking research papers, graduation theses, and Wikipedia articles for machine-paraphrased content. It's especially effective against content modified by tools like SpinBot and SpinnerChief.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026