gpt3-finnish-small
Property | Value |
---|---|
Parameter Count | 186M |
Architecture | BLOOM-based GPT-3 |
License | Apache 2.0 |
Training Tokens | 300B |
Language | Finnish |
What is gpt3-finnish-small?
gpt3-finnish-small is part of TurkuNLP's Finnish GPT-3 model family, specifically designed for Finnish language text generation. As the smallest variant in the series with 186M parameters, it features 12 layers, 768 dimensional embeddings, and 12 attention heads. The model is built on the BLOOM architecture and trained on a comprehensive dataset of 300B tokens from diverse Finnish sources.
Implementation Details
The model is implemented using PyTorch and Transformers, incorporating key architectural elements from the BLOOM framework. It utilizes a carefully curated training dataset that combines multiple Finnish resources, including Internet Parsebank, Common Crawl, Wikipedia, news archives, and social media content, with specific sampling ratios to ensure quality and diversity.
- 12 transformer layers with 768-dimensional embeddings
- 12 attention heads for efficient context processing
- Trained on a weighted combination of sources, with Parsebank (22.7%) and Common Crawl (34.4%) forming the majority
- Implements standard transformer architecture for autoregressive text generation
Core Capabilities
- Pure language modeling for Finnish text generation
- Foundation model suitable for further fine-tuning
- Text completion and generation tasks
- Feature extraction for downstream NLP tasks
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for Finnish language processing, trained on an extensive and diverse dataset of Finnish text. It's part of a larger family of models that provides varying capacities for different computational requirements.
Q: What are the recommended use cases?
The model is best suited as a foundation model for further fine-tuning on specific tasks. It's important to note that it's not instruction-tuned for dialogue or question-answering out of the box, but rather serves as a base model for such adaptations.