camembertav2-base

camembertav2-base

almanach

CamemBERTav2 is a powerful French language model with 111M parameters, trained on 275B tokens using DebertaV2 architecture, optimized for various NLP tasks.

PropertyValue
Parameter Count111M parameters
LicenseMIT
LanguageFrench
PaperView Paper
Training Data275B tokens

What is camembertav2-base?

CamemBERTav2-base is an advanced French language model that represents a significant evolution in French NLP. Built on the DebertaV2 architecture, this model has been trained on an impressive 275B tokens of French text, combining data from OSCAR dumps, scientific documents from HALvest, and French Wikipedia.

Implementation Details

The model implements a sophisticated architecture with several technical improvements over its predecessor:

  • Extended context window of 1024 tokens
  • New WordPiece tokenizer with 32,768 tokens
  • Improved number handling and emoji support
  • Trained using Replaced Token Detection (RTD) with 20% mask rate

Core Capabilities

  • State-of-the-art performance in POS tagging (97.71%)
  • Superior NER capabilities (93.40% on FTB-NER)
  • Excellent performance on XNLI (84.82%)
  • Advanced question answering capabilities (83.04% F1 on FQuAD)
  • Enhanced medical NER performance (73.98%)

Frequently Asked Questions

Q: What makes this model unique?

CamemBERTav2 stands out due to its massive training dataset (275B tokens vs previous 32B), improved tokenizer design, and state-of-the-art performance across multiple French NLP tasks. It's particularly notable for its balanced performance across both general and specialized domains.

Q: What are the recommended use cases?

The model excels in various NLP tasks including POS tagging, named entity recognition, text classification, and question answering. It's particularly well-suited for both general French language processing and specialized domains like medical text analysis.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026