BLOOM-560M Language Model

Property	Value
Parameters	559,214,592
License	BigScience RAIL-1.0
Supported Languages	48 languages
Architecture	Transformer-based decoder-only
Training Data	1.5TB preprocessed text

What is BLOOM-560M?

BLOOM-560M is a multilingual language model developed by BigScience, designed as part of the larger BLOOM model family. It's a powerful language model trained on 1.5TB of text data across 45 natural languages and 12 programming languages, making it one of the most linguistically diverse models of its size.

Implementation Details

The model features a decoder-only architecture with 24 layers and 16 attention heads. It uses ALiBI positional encodings and implements layer normalization in the word embeddings layer. The model processes sequences up to 2048 tokens and operates with FP16 precision for efficient computation.

24-layer architecture with 16 attention heads
1024-dimensional hidden layers
Trained on Jean Zay supercomputer using nuclear energy
Implements advanced features like ALiBI positional encodings and GeLU activation functions

Core Capabilities

Multilingual text generation across 48 languages
Code generation in 12 programming languages
Natural language understanding and generation
Pre-training base for fine-tuning on specific tasks

Frequently Asked Questions

Q: What makes this model unique?

BLOOM-560M stands out for its exceptional language diversity, supporting 45 natural languages including many low-resource African languages, making it one of the most inclusive language models of its size.

Q: What are the recommended use cases?

The model is ideal for text generation, research exploration, and as a base model for fine-tuning on specific tasks like information extraction, question answering, and summarization. However, it should not be used for high-stakes decisions or critical applications.

bloom-560m