camembert-large

Maintained By
almanach

CamemBERT-large

PropertyValue
Parameter Count337M parameters
Model TypeTransformer-based Language Model
Training DataCCNet (135GB of text)
PaperResearch Paper
Tensor TypeF32

What is camembert-large?

CamemBERT-large is a state-of-the-art French language model based on the RoBERTa architecture. As the larger variant of the CamemBERT family, it contains 337M parameters and was trained on an extensive 135GB dataset from CCNet. This model represents a significant advancement in French natural language processing, offering superior performance for various NLP tasks.

Implementation Details

The model is implemented using the Transformers library and PyTorch backend, featuring a large architecture configuration with advanced tokenization capabilities through SentencePiece. It supports both inference endpoints and Safetensors, making it versatile for production deployments.

  • Large-scale architecture with 337M parameters for enhanced modeling capacity
  • Trained on CCNet corpus (135GB) for comprehensive French language understanding
  • Built on RoBERTa architecture with optimizations for French language processing
  • Supports masked language modeling and feature extraction

Core Capabilities

  • Contextual word embeddings generation
  • Masked language modeling for text completion
  • Feature extraction from all 24 attention layers
  • Support for both inference and fine-tuning workflows

Frequently Asked Questions

Q: What makes this model unique?

CamemBERT-large is distinguished by its large parameter count (337M) and extensive training on French-specific data, making it particularly effective for French language tasks. It offers state-of-the-art performance while maintaining compatibility with the Hugging Face ecosystem.

Q: What are the recommended use cases?

The model excels at various French NLP tasks, including text classification, named entity recognition, and masked language modeling. It's particularly suitable for applications requiring deep language understanding in French, such as content analysis, text generation, and semantic search.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.