CamemBERT2CamemBERT French Summarization Model
Property | Value |
---|---|
Base Architecture | CamemBERT (RoBERTa) |
Task | French Text Summarization |
Dataset | MLSUM French |
ROUGE-2 F1 Score | 13.30 |
Author | Manuel Romero (mrm8488) |
What is camembert2camembert_shared-finetuned-french-summarization?
This model is a specialized French text summarization system built on the CamemBERT architecture, which is a RoBERTa-based model specifically designed for French language processing. It has been fine-tuned on the MLSUM French dataset, part of a larger multilingual summarization corpus containing 1.5M+ article/summary pairs across multiple languages.
Implementation Details
The model utilizes an encoder-decoder architecture with shared weights, based on CamemBERT. It processes input texts up to 512 tokens and generates concise summaries while maintaining the essential information from the source text.
- Built on CamemBERT base architecture
- Implements shared encoder-decoder framework
- Supports maximum input length of 512 tokens
- Achieves 14.47 ROUGE-2 precision and 12.90 recall scores
Core Capabilities
- French text summarization with high precision
- Handles long-form input text efficiently
- Generates coherent and concise summaries
- Maintains semantic accuracy in summarization
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in French text summarization using a shared encoder-decoder architecture, which is relatively uncommon. It's specifically trained on the MLSUM dataset, making it highly adapted for French news article summarization.
Q: What are the recommended use cases?
The model is ideal for automated French news article summarization, content condensation for French texts, and generating brief overviews of longer French documents. It's particularly well-suited for journalistic content given its training data.