paraphrase-bert-pt
Property | Value |
---|---|
Author | Prompsit |
Base Model | neuralmind/bert-base-portuguese-cased |
Accuracy | 78.09% |
Model URL | Hugging Face |
What is paraphrase-bert-pt?
paraphrase-bert-pt is a specialized Portuguese language model designed for paraphrase detection. Developed by Prompsit under a TSI project co-financed by Spain's Ministry of Economic Affairs and Digital Transformation, this model evaluates whether two given phrases express the same meaning using different words.
Implementation Details
The model is fine-tuned from the neuralmind/bert-base-portuguese-cased architecture and outputs binary classification probabilities: 0 for non-paraphrases and 1 for valid paraphrases. It's specifically optimized for phrase-level analysis rather than full sentences, making it efficient for targeted paraphrase detection tasks.
- Binary classification output (paraphrase/non-paraphrase)
- Tested on 16,500 human-tagged phrase pairs
- Achieves 71.57% precision and 40.55% recall
- F1 score of 0.518 and Matthews Correlation of 0.416
Core Capabilities
- Phrase-level paraphrase detection in Portuguese
- Probability-based classification output
- Efficient processing (607.587 samples per second)
- Optimized for short text fragments without punctuation
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for Portuguese language paraphrase detection at the phrase level, making it particularly useful for applications requiring semantic similarity assessment in Portuguese text fragments.
Q: What are the recommended use cases?
The model is best suited for: phrase-level paraphrase verification, semantic similarity checking in Portuguese, content matching systems, and automated text analysis where identifying equivalent expressions is crucial.