DistilCamemBERT Base

Property	Value
Author	cmarkea
Model Type	Distilled French Language Model
Training Data	OSCAR Dataset (140GB)
Training Duration	18 days on Nvidia Titan RTX
Paper	Download PDF

What is distilcamembert-base?

DistilCamemBERT is a compressed version of the CamemBERT model, specifically designed for French language processing. Through knowledge distillation, it maintains impressive performance while significantly reducing computational requirements. The model achieves 83% F1-score on FLUE CLS tasks and remarkable 98% accuracy on French NER tasks.

Implementation Details

The model employs a sophisticated three-part loss function for training: DistilLoss (50%), CosineLoss (30%), and MLMLoss (20%). This combination ensures the student model effectively learns from the teacher while maintaining essential language understanding capabilities.

Trained on the OSCAR dataset (same as original CamemBERT)
Implements masked language modeling capabilities
Optimized for French language understanding tasks
Preserves core functionality while reducing model size

Core Capabilities

Text Classification (83% FLUE CLS score)
Named Entity Recognition (98% accuracy)
Cross-lingual Natural Language Inference (77% XNLI score)
Masked Language Modeling
Semantic Analysis

Frequently Asked Questions

Q: What makes this model unique?

DistilCamemBERT stands out for being a highly efficient, distilled version of CamemBERT that maintains strong performance while requiring fewer computational resources. It's specifically optimized for French language tasks and achieves near-original model performance levels.

Q: What are the recommended use cases?

The model is ideal for French language processing tasks including text classification, named entity recognition, and masked language modeling. It's particularly suitable for applications where computational efficiency is important while maintaining high accuracy in French language understanding.