BLOOM-560M Language Model
Property | Value |
---|---|
Parameters | 559,214,592 |
License | BigScience RAIL-1.0 |
Supported Languages | 48 languages |
Architecture | Transformer-based decoder-only |
Training Data | 1.5TB preprocessed text |
What is BLOOM-560M?
BLOOM-560M is a multilingual language model developed by BigScience, designed as part of the larger BLOOM model family. It's a powerful language model trained on 1.5TB of text data across 45 natural languages and 12 programming languages, making it one of the most linguistically diverse models of its size.
Implementation Details
The model features a decoder-only architecture with 24 layers and 16 attention heads. It uses ALiBI positional encodings and implements layer normalization in the word embeddings layer. The model processes sequences up to 2048 tokens and operates with FP16 precision for efficient computation.
- 24-layer architecture with 16 attention heads
- 1024-dimensional hidden layers
- Trained on Jean Zay supercomputer using nuclear energy
- Implements advanced features like ALiBI positional encodings and GeLU activation functions
Core Capabilities
- Multilingual text generation across 48 languages
- Code generation in 12 programming languages
- Natural language understanding and generation
- Pre-training base for fine-tuning on specific tasks
Frequently Asked Questions
Q: What makes this model unique?
BLOOM-560M stands out for its exceptional language diversity, supporting 45 natural languages including many low-resource African languages, making it one of the most inclusive language models of its size.
Q: What are the recommended use cases?
The model is ideal for text generation, research exploration, and as a base model for fine-tuning on specific tasks like information extraction, question answering, and summarization. However, it should not be used for high-stakes decisions or critical applications.