sbert-uncased-finnish-paraphrase

sbert-uncased-finnish-paraphrase

TurkuNLP

Uncased Finnish SBERT model trained on paraphrase data, optimized for semantic similarity tasks using mean pooling and FinBERT base

PropertyValue
AuthorTurkuNLP
Base ModelFinBERT (bert-base-finnish-uncased-v1)
Training DataFinnish Paraphrase Corpus + 500K positive/5M negative samples
Model HubHuggingFace

What is sbert-uncased-finnish-paraphrase?

This is a specialized Finnish Sentence BERT model designed for generating semantic embeddings of Finnish text. Built upon the FinBERT architecture, it's specifically trained for paraphrase detection and semantic similarity tasks using a large corpus of Finnish language data. The model employs mean pooling strategy and is trained on both manually curated and automatically collected paraphrase pairs.

Implementation Details

The model is implemented using the sentence-transformers library and can be easily deployed using either SentenceTransformer or HuggingFace Transformers APIs. It uses mean pooling for sentence embeddings and is trained on binary classification of paraphrase pairs, where scores of 3 and 4 are considered paraphrases, while 1 and 2 are non-paraphrases.

  • Uncased text processing for better generalization
  • 128 token maximum sequence length
  • 768-dimensional word embeddings
  • Optimized for Finnish language understanding

Core Capabilities

  • Semantic similarity computation between Finnish sentences
  • Paraphrase detection and verification
  • Sentence embedding generation for downstream tasks
  • Large-scale text similarity search (demonstrated on 400M sentences)

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Finnish language processing, combining the power of FinBERT with specialized training on paraphrase detection. It's one of the few models specifically designed for Finnish semantic similarity tasks.

Q: What are the recommended use cases?

The model excels at tasks requiring semantic understanding of Finnish text, including paraphrase detection, information retrieval, and semantic search applications. It's particularly useful for applications requiring comparison of sentence meanings in Finnish.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026