Poro-34B

Property	Value
Parameter Count	34.2B
Architecture	BLOOM with ALiBi embeddings
Training Data	1 trillion tokens
License	Apache 2.0
Paper	arXiv:2404.01856

What is Poro-34B?

Poro-34B is a powerful multilingual language model developed through collaboration between SiloGen, TurkuNLP group, and HPLT. Named after the Finnish word for reindeer, this model represents a significant advancement in multilingual AI, specifically designed to excel in Finnish and English language processing while maintaining strong code generation capabilities.

Implementation Details

The model leverages a BLOOM architecture with 54 layers and 56 attention heads, implemented with a dimension size of 7168. It utilizes ALiBi embeddings for context length extrapolation and was trained on the LUMI supercomputer using 512 AMD MI250X GPUs.

Trained on 1 trillion tokens across multiple datasets
Uses bfloat16 precision
Custom 128K vocabulary tokenizer
Implements 3D parallelism strategy (TP=2, PP=4, DP=128)

Core Capabilities

Bilingual proficiency in Finnish and English
Code generation and understanding
Translation between Finnish and English
Context length of 2048 tokens
Support for context length extrapolation

Frequently Asked Questions

Q: What makes this model unique?

Poro-34B stands out for its specialized focus on Finnish language support while maintaining strong English and coding capabilities. It's one of the few large language models specifically optimized for Finnish language processing, trained on a comprehensive mix of Finnish cultural and linguistic data.

Q: What are the recommended use cases?

As a base model, Poro-34B requires fine-tuning for specific applications. It's particularly well-suited for bilingual applications involving Finnish and English, code generation tasks, and scenarios requiring deep understanding of Finnish cultural context. However, users should note that it's primarily a research release and may need additional fine-tuning for production use cases.

Poro-34B

Poro-34B

What is Poro-34B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models