Bielik-7B-v0.1
Property | Value |
---|---|
Parameter Count | 7.24B |
License | Apache 2.0 |
Language | Polish |
Paper | arxiv:2410.18565 |
Base Model | Mistral-7B-v0.1 |
What is Bielik-7B-v0.1?
Bielik-7B-v0.1 is a sophisticated Polish language model developed through collaboration between SpeakLeash and ACK Cyfronet AGH. Built upon Mistral-7B-v0.1, it has been trained on over 70 billion tokens of carefully curated Polish text, making it particularly adept at understanding and processing the Polish language.
Implementation Details
The model was trained using the ALLaMo framework on the Helios Supercomputer, utilizing 256 NVidia GH200 cards with impressive throughput exceeding 9200 tokens/gpu/second. The training process involved 36 billion tokens over two epochs, with mixed precision (bfloat16) training.
- Context length: 4096 tokens
- Training batch size: 4194304
- Learning rate: 3e-05 -> 2e-05 (cosine schedule)
- Advanced quality evaluation using XGBoost classification
Core Capabilities
- State-of-the-art performance in RAG Reader tasks (88.39% score)
- Excellent perplexity metrics (123.31)
- Robust text generation and understanding in Polish
- Optimized for further fine-tuning across various use cases
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its specialized optimization for Polish language processing, achieved through extensive training on high-quality Polish text corpora and validated through sophisticated quality evaluation methods.
Q: What are the recommended use cases?
As a base model, Bielik-7B-v0.1 is designed for further fine-tuning across various applications. For direct chatting or instruction-following, users should consider the Bielik-7B-Instruct-v0.1 variant instead.