pos-french

Maintained By
qanastek

POET: French Extended Part-of-Speech Tagger

PropertyValue
ArchitectureBi-LSTM-CRF
Training DataANTILLES Corpus
EmbeddingsFastText
Accuracy95.2%
Number of Tags60
PaperANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus

What is pos-french?

pos-french is an advanced French Part-of-Speech (POS) tagger that extends traditional POS tagging capabilities by providing detailed linguistic information including gender, number, mood, person, and tense. Built on the ANTILLES corpus, which is an enhanced version of UD_French-GSD, this model represents a significant advancement in French language processing.

Implementation Details

The model implements a Bi-LSTM-CRF architecture trained on FastText embeddings over 115 epochs. It processes raw text without normalization, making it sensitive to case and punctuation. The training corpus contains 400,399 words across 16,341 sentences, offering comprehensive coverage of French language patterns.

  • Bi-directional LSTM with CRF layer for optimal sequence labeling
  • FastText embeddings for robust word representation
  • Case and punctuation sensitive processing
  • 60 distinct POS tags for detailed linguistic analysis

Core Capabilities

  • High-precision tagging with 95.2% accuracy
  • Detailed grammatical analysis including gender and number
  • Support for proper nouns, demonstrative pronouns, and complex verb forms
  • Handling of numbers, symbols, and unknown words

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its extensive tag set of 60 different classes, providing much more detailed linguistic information than traditional POS taggers. It combines the robustness of modern neural architectures (Bi-LSTM-CRF) with comprehensive French language features, making it particularly valuable for detailed linguistic analysis.

Q: What are the recommended use cases?

The model is ideal for applications requiring detailed French language analysis, including linguistic research, text analysis tools, grammar checking applications, and educational software. It's particularly useful when fine-grained grammatical information is needed, such as gender-specific analysis or verb tense identification.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.