distilcamembert-base

Maintained By
cmarkea

DistilCamemBERT Base

PropertyValue
Authorcmarkea
Model TypeDistilled French Language Model
Training DataOSCAR Dataset (140GB)
Training Duration18 days on Nvidia Titan RTX
PaperDownload PDF

What is distilcamembert-base?

DistilCamemBERT is a compressed version of the CamemBERT model, specifically designed for French language processing. Through knowledge distillation, it maintains impressive performance while significantly reducing computational requirements. The model achieves 83% F1-score on FLUE CLS tasks and remarkable 98% accuracy on French NER tasks.

Implementation Details

The model employs a sophisticated three-part loss function for training: DistilLoss (50%), CosineLoss (30%), and MLMLoss (20%). This combination ensures the student model effectively learns from the teacher while maintaining essential language understanding capabilities.

  • Trained on the OSCAR dataset (same as original CamemBERT)
  • Implements masked language modeling capabilities
  • Optimized for French language understanding tasks
  • Preserves core functionality while reducing model size

Core Capabilities

  • Text Classification (83% FLUE CLS score)
  • Named Entity Recognition (98% accuracy)
  • Cross-lingual Natural Language Inference (77% XNLI score)
  • Masked Language Modeling
  • Semantic Analysis

Frequently Asked Questions

Q: What makes this model unique?

DistilCamemBERT stands out for being a highly efficient, distilled version of CamemBERT that maintains strong performance while requiring fewer computational resources. It's specifically optimized for French language tasks and achieves near-original model performance levels.

Q: What are the recommended use cases?

The model is ideal for French language processing tasks including text classification, named entity recognition, and masked language modeling. It's particularly suitable for applications where computational efficiency is important while maintaining high accuracy in French language understanding.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.