ALBERTI BERT Base Multilingual Cased

Property	Value
Parameter Count	178M
License	CC-BY-4.0
Primary Task	Fill-Mask
Framework	PyTorch, JAX, Flax

What is alberti-bert-base-multilingual-cased?

ALBERTI is a specialized multilingual BERT model designed specifically for poetry analysis. It consists of two variants: one for verses and another for stanzas, trained on the extensive PULPO (Prodigious Unannotated Literary Poetry Corpus) dataset containing over 95M words across multiple languages.

Implementation Details

The model builds upon the BERT architecture and has been further trained using Flax framework. It incorporates 178M parameters and supports multiple tensor types including I64 and F32. The training leveraged the comprehensive PULPO corpus, which includes poetry from various languages including Spanish, English, French, Italian, Czech, Portuguese, Arabic, Chinese, Finnish, German, Hungarian, and Russian.

Multilingual poetry analysis capabilities
Verse and stanza-specific variants
Implementation in multiple frameworks (PyTorch, JAX, Flax)
Extensive training on diverse poetry corpora

Core Capabilities

Poetry-specific masked language modeling
Multi-language support across 12+ languages
Specialized verse and stanza analysis
Integration with popular deep learning frameworks

Frequently Asked Questions

Q: What makes this model unique?

ALBERTI's uniqueness lies in its specialization for poetry analysis across multiple languages, trained on one of the largest poetry corpora available. The dual model approach for verses and stanzas makes it particularly suited for detailed poetic analysis.

Q: What are the recommended use cases?

The model is ideal for poetry analysis tasks, including verse completion, poetry generation, structural analysis of poems, and cross-lingual poetry studies. It's particularly useful for researchers and applications in computational literary studies.