GePpeTto
Property | Value |
---|---|
Model Type | GPT-2 Language Model |
Parameters | 117M |
Training Data | Italian Wikipedia + ItWac Corpus |
Paper | GePpeTto Carves Italian into a Language Model |
Author | Lorenzo De Mattei et al. |
What is GePpeTto?
GePpeTto is a pioneering Italian language model based on GPT-2 architecture, specifically trained on a comprehensive corpus of Italian text. The model represents a significant advancement in Italian natural language processing, trained on 13.8GB of text combining Wikipedia dumps and web-crawled content.
Implementation Details
The model was trained on 4 NVIDIA Tesla T4 GPUs for 620,000 steps using Hugging Face's implementation of GPT-2. It features a 30k vocabulary size, operates with a block size of 100, and utilizes the Adam optimizer with an initial learning rate of 5e-5 and 10k warm-up steps.
- Comprehensive training on both formal (Wikipedia) and informal (web) Italian text
- Achieves impressive perplexity scores across various domains (26.10 for Wikipedia, 30.39 for ItWac)
- Optimized for Italian language generation and understanding
Core Capabilities
- Natural Italian text generation
- Domain adaptation across various text types
- Strong performance on both formal and informal Italian content
- Flexible integration through Hugging Face's transformers library
Frequently Asked Questions
Q: What makes this model unique?
GePpeTto is the first large-scale Italian language model of its kind, specifically trained on a diverse range of Italian texts. Its unique training corpus combines contemporary Wikipedia content with historical web texts, providing broad coverage of Italian language variations.
Q: What are the recommended use cases?
The model excels in Italian text generation tasks, showing particularly strong performance in formal contexts like Wikipedia-style content (26.10 perplexity) and web text (30.39 perplexity). It's suitable for applications ranging from content generation to text completion in Italian.