paraphrase-bert-pt

Property	Value
Author	Prompsit
Base Model	neuralmind/bert-base-portuguese-cased
Accuracy	78.09%
Model URL	Hugging Face

What is paraphrase-bert-pt?

paraphrase-bert-pt is a specialized Portuguese language model designed for paraphrase detection. Developed by Prompsit under a TSI project co-financed by Spain's Ministry of Economic Affairs and Digital Transformation, this model evaluates whether two given phrases express the same meaning using different words.

Implementation Details

The model is fine-tuned from the neuralmind/bert-base-portuguese-cased architecture and outputs binary classification probabilities: 0 for non-paraphrases and 1 for valid paraphrases. It's specifically optimized for phrase-level analysis rather than full sentences, making it efficient for targeted paraphrase detection tasks.

Binary classification output (paraphrase/non-paraphrase)
Tested on 16,500 human-tagged phrase pairs
Achieves 71.57% precision and 40.55% recall
F1 score of 0.518 and Matthews Correlation of 0.416

Core Capabilities

Phrase-level paraphrase detection in Portuguese
Probability-based classification output
Efficient processing (607.587 samples per second)
Optimized for short text fragments without punctuation

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for Portuguese language paraphrase detection at the phrase level, making it particularly useful for applications requiring semantic similarity assessment in Portuguese text fragments.

Q: What are the recommended use cases?

The model is best suited for: phrase-level paraphrase verification, semantic similarity checking in Portuguese, content matching systems, and automated text analysis where identifying equivalent expressions is crucial.