camembert-ner-with-dates
Property | Value |
---|---|
Author | Jean-Baptiste |
Task | Named Entity Recognition with Dates |
Base Model | CamemBERT |
Training Data | Enhanced wikiner-fr (~170,634 sentences) |
Performance | F1 Score: 83% on test data |
What is camembert-ner-with-dates?
camembert-ner-with-dates is an enhanced French language model specifically designed for Named Entity Recognition (NER) tasks, with additional capabilities for date detection. Built upon the CamemBERT architecture, this model extends the traditional NER functionality by incorporating specialized date recognition abilities, making it particularly useful for processing French text containing temporal information.
Implementation Details
The model is implemented using the Hugging Face Transformers library and achieves impressive performance metrics across different entity types. It demonstrates strong capabilities in recognizing locations (93.1% F1), persons (95.9% F1), organizations (86.5% F1), and miscellaneous entities (86.0% F1). The date recognition component, while not formally evaluated in the same way, is estimated to achieve approximately 90% F1 score.
- Built on CamemBERT architecture
- Trained on enhanced wikiner-fr dataset with ~170,634 sentences
- Supports integration with dateparser library for datetime object conversion
- Uses simple aggregation strategy for entity recognition
Core Capabilities
- Named Entity Recognition for standard entities (PER, LOC, ORG, MISC)
- Enhanced date recognition capabilities
- High accuracy with F1 score of ~83% on mixed chat and email data
- Compatible with dateparser for converting recognized dates to Python datetime objects
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its combined ability to perform traditional NER tasks while excelling at date recognition in French text. This dual capability, along with its high accuracy rate compared to traditional date parsing solutions (83% vs 70% for dateparser), makes it particularly valuable for applications requiring both entity and temporal information extraction.
Q: What are the recommended use cases?
The model is particularly well-suited for processing French text in various formats, including chat messages and emails. It's ideal for applications requiring extraction of both named entities and temporal information, such as automated scheduling systems, content analysis tools, and information extraction pipelines working with French language content.