DistilCamemBERT Base
Property | Value |
---|---|
Author | cmarkea |
Model Type | Distilled French Language Model |
Training Data | OSCAR Dataset (140GB) |
Training Duration | 18 days on Nvidia Titan RTX |
Paper | Download PDF |
What is distilcamembert-base?
DistilCamemBERT is a compressed version of the CamemBERT model, specifically designed for French language processing. Through knowledge distillation, it maintains impressive performance while significantly reducing computational requirements. The model achieves 83% F1-score on FLUE CLS tasks and remarkable 98% accuracy on French NER tasks.
Implementation Details
The model employs a sophisticated three-part loss function for training: DistilLoss (50%), CosineLoss (30%), and MLMLoss (20%). This combination ensures the student model effectively learns from the teacher while maintaining essential language understanding capabilities.
- Trained on the OSCAR dataset (same as original CamemBERT)
- Implements masked language modeling capabilities
- Optimized for French language understanding tasks
- Preserves core functionality while reducing model size
Core Capabilities
- Text Classification (83% FLUE CLS score)
- Named Entity Recognition (98% accuracy)
- Cross-lingual Natural Language Inference (77% XNLI score)
- Masked Language Modeling
- Semantic Analysis
Frequently Asked Questions
Q: What makes this model unique?
DistilCamemBERT stands out for being a highly efficient, distilled version of CamemBERT that maintains strong performance while requiring fewer computational resources. It's specifically optimized for French language tasks and achieves near-original model performance levels.
Q: What are the recommended use cases?
The model is ideal for French language processing tasks including text classification, named entity recognition, and masked language modeling. It's particularly suitable for applications where computational efficiency is important while maintaining high accuracy in French language understanding.