POET: French Extended Part-of-Speech Tagger
Property | Value |
---|---|
Architecture | Bi-LSTM-CRF |
Training Data | ANTILLES Corpus |
Embeddings | FastText |
Accuracy | 95.2% |
Number of Tags | 60 |
Paper | ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus |
What is pos-french?
pos-french is an advanced French Part-of-Speech (POS) tagger that extends traditional POS tagging capabilities by providing detailed linguistic information including gender, number, mood, person, and tense. Built on the ANTILLES corpus, which is an enhanced version of UD_French-GSD, this model represents a significant advancement in French language processing.
Implementation Details
The model implements a Bi-LSTM-CRF architecture trained on FastText embeddings over 115 epochs. It processes raw text without normalization, making it sensitive to case and punctuation. The training corpus contains 400,399 words across 16,341 sentences, offering comprehensive coverage of French language patterns.
- Bi-directional LSTM with CRF layer for optimal sequence labeling
- FastText embeddings for robust word representation
- Case and punctuation sensitive processing
- 60 distinct POS tags for detailed linguistic analysis
Core Capabilities
- High-precision tagging with 95.2% accuracy
- Detailed grammatical analysis including gender and number
- Support for proper nouns, demonstrative pronouns, and complex verb forms
- Handling of numbers, symbols, and unknown words
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its extensive tag set of 60 different classes, providing much more detailed linguistic information than traditional POS taggers. It combines the robustness of modern neural architectures (Bi-LSTM-CRF) with comprehensive French language features, making it particularly valuable for detailed linguistic analysis.
Q: What are the recommended use cases?
The model is ideal for applications requiring detailed French language analysis, including linguistic research, text analysis tools, grammar checking applications, and educational software. It's particularly useful when fine-grained grammatical information is needed, such as gender-specific analysis or verb tense identification.