BLOOM-1B7

Property	Value
Parameter Count	1.72B parameters
License	BigScience RAIL 1.0
Languages	48 languages including natural and programming
Base Paper	Architecture Paper
Model Type	Decoder-only Transformer

What is BLOOM-1B7?

BLOOM-1B7 is a multilingual large language model developed by BigScience, representing a significant achievement in open-science AI development. With 1.72 billion parameters, it's designed to handle both text generation and serve as a pre-trained base for fine-tuning across various tasks. The model supports an impressive array of 45 natural languages and 12 programming languages, making it a versatile tool for diverse applications.

Implementation Details

The model uses a modified Megatron-LM GPT2 architecture with several key innovations, including layer normalization in the word embeddings layer (StableEmbedding) and ALiBI positional encodings. The architecture consists of 24 layers with 16 attention heads, and hidden layers are 2048-dimensional.

Training utilized 64 V100 GPUs across 16 nodes
Implements FP16 precision for efficient computation
Uses a byte-level BPE tokenizer with 250,680 vocabulary size
Supports sequence lengths of up to 2048 tokens

Core Capabilities

Text generation across multiple languages
Code generation in 12 programming languages
Information extraction and question answering
Summarization tasks
Cross-lingual understanding and generation

Frequently Asked Questions

Q: What makes this model unique?

BLOOM-1B7 stands out for its extensive language support and open-science approach. It's specifically designed to be accessible for public research while maintaining high performance across multiple languages and programming tasks.

Q: What are the recommended use cases?

The model is best suited for research purposes, text generation tasks, and as a pre-training base for specific downstream tasks. However, it should not be used for high-stakes decisions or critical applications where reliability is crucial.

bloom-1b7