journaux-lm-v1

Maintained By
PleIAs

Journaux-LM-v1

PropertyValue
LicenseApache 2.0
LanguageFrench
Training Data Size408GB
ArchitectureELECTRA with TEAMS approach

What is journaux-lm-v1?

Journaux-LM-v1 is a specialized French language model designed for processing historical newspapers. Built on the ELECTRA architecture and trained using the TEAMS approach, this model represents a significant advancement in handling historical French text documents. The model was trained on the comprehensive PleIAs/French-PD-Newspapers dataset, encompassing 408GB of historical French newspaper content.

Implementation Details

The model implements an ELECTRA architecture enhanced with TEAMS (Token-level Ensemble Approach for Modeling Sequences) methodology. It has been specifically optimized for Named Entity Recognition (NER) tasks, demonstrating superior performance compared to the French Europeana BERT model across multiple benchmark datasets.

  • Trained on PleIAs/French-PD-Newspapers dataset
  • Implements TEAMS approach for enhanced sequence modeling
  • Optimized for historical text processing
  • Achieves state-of-the-art performance on multiple NER benchmarks

Core Capabilities

  • Named Entity Recognition with average 77.17% F1-score on test sets
  • Specialized processing of historical French texts
  • Improved performance over existing French language models
  • Handles various historical document types and formats

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on historical French newspapers, utilizing the TEAMS approach with ELECTRA architecture. It consistently outperforms the French Europeana BERT model, showing improvements of up to 1.12% on development sets and 0.98% on test sets for NER tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for: Named Entity Recognition in historical French texts, processing of historical newspaper content, and analysis of public domain French language materials. It excels in tasks requiring understanding of historical context and language patterns.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.