camembert-ner-with-dates

Property	Value
Author	Jean-Baptiste
Task	Named Entity Recognition with Dates
Base Model	CamemBERT
Training Data	Enhanced wikiner-fr (~170,634 sentences)
Performance	F1 Score: 83% on test data

What is camembert-ner-with-dates?

camembert-ner-with-dates is an enhanced French language model specifically designed for Named Entity Recognition (NER) tasks, with additional capabilities for date detection. Built upon the CamemBERT architecture, this model extends the traditional NER functionality by incorporating specialized date recognition abilities, making it particularly useful for processing French text containing temporal information.

Implementation Details

The model is implemented using the Hugging Face Transformers library and achieves impressive performance metrics across different entity types. It demonstrates strong capabilities in recognizing locations (93.1% F1), persons (95.9% F1), organizations (86.5% F1), and miscellaneous entities (86.0% F1). The date recognition component, while not formally evaluated in the same way, is estimated to achieve approximately 90% F1 score.

Built on CamemBERT architecture
Trained on enhanced wikiner-fr dataset with ~170,634 sentences
Supports integration with dateparser library for datetime object conversion
Uses simple aggregation strategy for entity recognition

Core Capabilities

Named Entity Recognition for standard entities (PER, LOC, ORG, MISC)
Enhanced date recognition capabilities
High accuracy with F1 score of ~83% on mixed chat and email data
Compatible with dateparser for converting recognized dates to Python datetime objects

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its combined ability to perform traditional NER tasks while excelling at date recognition in French text. This dual capability, along with its high accuracy rate compared to traditional date parsing solutions (83% vs 70% for dateparser), makes it particularly valuable for applications requiring both entity and temporal information extraction.

Q: What are the recommended use cases?

The model is particularly well-suited for processing French text in various formats, including chat messages and emails. It's ideal for applications requiring extraction of both named entities and temporal information, such as automated scheduling systems, content analysis tools, and information extraction pipelines working with French language content.