sentence-flaubert-base

Lajavaness

French sentence embedding model based on FlauBERT, achieving SOTA performance with 85.5% Pearson correlation on STS-B benchmark. 137M parameters.

Property	Value
Parameter Count	137M
Model Type	Sentence Embedding
Architecture	FlauBERT Base (Fine-tuned)
Author	Lajavaness
HuggingFace URL	Link

What is sentence-flaubert-base?

sentence-flaubert-base is a state-of-the-art French sentence embedding model that leverages the FlauBERT architecture to generate high-quality semantic representations of French text. The model is fine-tuned using Siamese BERT-Networks and Augmented SBERT techniques on the STSB dataset, incorporating advanced pair sampling strategies through CrossEncoder-camembert-large and sentence-camembert-large models.

Implementation Details

The model utilizes a sophisticated architecture combining pre-trained FlauBERT base uncased with Siamese neural networks. It achieves impressive performance metrics, including 85.5% Pearson correlation on the STS-B benchmark, outperforming other French language models across multiple evaluation metrics.

Fine-tuned using Siamese BERT-Networks
Incorporates Augmented SBERT methodology
Optimized with pair sampling strategies
Achieves superior performance on French STS benchmarks

Core Capabilities

Generate high-quality French sentence embeddings
Strong performance on semantic similarity tasks
Consistent results across multiple benchmark datasets (STS12-16, SICK-fr)
Easy integration with the sentence-transformers library

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its superior performance on French semantic tasks, achieving the highest scores across multiple benchmarks (87.24% on STS13-fr, 88.00% on STS15-fr) while maintaining consistent performance across different evaluation metrics.

Q: What are the recommended use cases?

This model is ideal for French natural language processing tasks including semantic similarity analysis, text classification, and document comparison. It's particularly well-suited for applications requiring accurate semantic understanding of French text.