POET: French Extended Part-of-Speech Tagger

Property	Value
Architecture	Bi-LSTM-CRF
Training Data	ANTILLES Corpus
Embeddings	FastText
Accuracy	95.2%
Number of Tags	60
Paper	ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus

What is pos-french?

pos-french is an advanced French Part-of-Speech (POS) tagger that extends traditional POS tagging capabilities by providing detailed linguistic information including gender, number, mood, person, and tense. Built on the ANTILLES corpus, which is an enhanced version of UD_French-GSD, this model represents a significant advancement in French language processing.

Implementation Details

The model implements a Bi-LSTM-CRF architecture trained on FastText embeddings over 115 epochs. It processes raw text without normalization, making it sensitive to case and punctuation. The training corpus contains 400,399 words across 16,341 sentences, offering comprehensive coverage of French language patterns.

Bi-directional LSTM with CRF layer for optimal sequence labeling
FastText embeddings for robust word representation
Case and punctuation sensitive processing
60 distinct POS tags for detailed linguistic analysis

Core Capabilities

High-precision tagging with 95.2% accuracy
Detailed grammatical analysis including gender and number
Support for proper nouns, demonstrative pronouns, and complex verb forms
Handling of numbers, symbols, and unknown words

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its extensive tag set of 60 different classes, providing much more detailed linguistic information than traditional POS taggers. It combines the robustness of modern neural architectures (Bi-LSTM-CRF) with comprehensive French language features, making it particularly valuable for detailed linguistic analysis.

Q: What are the recommended use cases?

The model is ideal for applications requiring detailed French language analysis, including linguistic research, text analysis tools, grammar checking applications, and educational software. It's particularly useful when fine-grained grammatical information is needed, such as gender-specific analysis or verb tense identification.

pos-french