GePpeTto

Property	Value
Model Type	GPT-2 Language Model
Parameters	117M
Training Data	Italian Wikipedia + ItWac Corpus
Paper	GePpeTto Carves Italian into a Language Model
Author	Lorenzo De Mattei et al.

What is GePpeTto?

GePpeTto is a pioneering Italian language model based on GPT-2 architecture, specifically trained on a comprehensive corpus of Italian text. The model represents a significant advancement in Italian natural language processing, trained on 13.8GB of text combining Wikipedia dumps and web-crawled content.

Implementation Details

The model was trained on 4 NVIDIA Tesla T4 GPUs for 620,000 steps using Hugging Face's implementation of GPT-2. It features a 30k vocabulary size, operates with a block size of 100, and utilizes the Adam optimizer with an initial learning rate of 5e-5 and 10k warm-up steps.

Comprehensive training on both formal (Wikipedia) and informal (web) Italian text
Achieves impressive perplexity scores across various domains (26.10 for Wikipedia, 30.39 for ItWac)
Optimized for Italian language generation and understanding

Core Capabilities

Natural Italian text generation
Domain adaptation across various text types
Strong performance on both formal and informal Italian content
Flexible integration through Hugging Face's transformers library

Frequently Asked Questions

Q: What makes this model unique?

GePpeTto is the first large-scale Italian language model of its kind, specifically trained on a diverse range of Italian texts. Its unique training corpus combines contemporary Wikipedia content with historical web texts, providing broad coverage of Italian language variations.

Q: What are the recommended use cases?

The model excels in Italian text generation tasks, showing particularly strong performance in formal contexts like Wikipedia-style content (26.10 perplexity) and web text (30.39 perplexity). It's suitable for applications ranging from content generation to text completion in Italian.

GePpeTto

GePpeTto

What is GePpeTto?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models