ALBERTI BERT Base Multilingual Cased
Property | Value |
---|---|
Parameter Count | 178M |
License | CC-BY-4.0 |
Primary Task | Fill-Mask |
Framework | PyTorch, JAX, Flax |
What is alberti-bert-base-multilingual-cased?
ALBERTI is a specialized multilingual BERT model designed specifically for poetry analysis. It consists of two variants: one for verses and another for stanzas, trained on the extensive PULPO (Prodigious Unannotated Literary Poetry Corpus) dataset containing over 95M words across multiple languages.
Implementation Details
The model builds upon the BERT architecture and has been further trained using Flax framework. It incorporates 178M parameters and supports multiple tensor types including I64 and F32. The training leveraged the comprehensive PULPO corpus, which includes poetry from various languages including Spanish, English, French, Italian, Czech, Portuguese, Arabic, Chinese, Finnish, German, Hungarian, and Russian.
- Multilingual poetry analysis capabilities
- Verse and stanza-specific variants
- Implementation in multiple frameworks (PyTorch, JAX, Flax)
- Extensive training on diverse poetry corpora
Core Capabilities
- Poetry-specific masked language modeling
- Multi-language support across 12+ languages
- Specialized verse and stanza analysis
- Integration with popular deep learning frameworks
Frequently Asked Questions
Q: What makes this model unique?
ALBERTI's uniqueness lies in its specialization for poetry analysis across multiple languages, trained on one of the largest poetry corpora available. The dual model approach for verses and stanzas makes it particularly suited for detailed poetic analysis.
Q: What are the recommended use cases?
The model is ideal for poetry analysis tasks, including verse completion, poetry generation, structural analysis of poems, and cross-lingual poetry studies. It's particularly useful for researchers and applications in computational literary studies.