BLOOM-1B7
Property | Value |
---|---|
Parameter Count | 1.72B parameters |
License | BigScience RAIL 1.0 |
Languages | 48 languages including natural and programming |
Base Paper | Architecture Paper |
Model Type | Decoder-only Transformer |
What is BLOOM-1B7?
BLOOM-1B7 is a multilingual large language model developed by BigScience, representing a significant achievement in open-science AI development. With 1.72 billion parameters, it's designed to handle both text generation and serve as a pre-trained base for fine-tuning across various tasks. The model supports an impressive array of 45 natural languages and 12 programming languages, making it a versatile tool for diverse applications.
Implementation Details
The model uses a modified Megatron-LM GPT2 architecture with several key innovations, including layer normalization in the word embeddings layer (StableEmbedding) and ALiBI positional encodings. The architecture consists of 24 layers with 16 attention heads, and hidden layers are 2048-dimensional.
- Training utilized 64 V100 GPUs across 16 nodes
- Implements FP16 precision for efficient computation
- Uses a byte-level BPE tokenizer with 250,680 vocabulary size
- Supports sequence lengths of up to 2048 tokens
Core Capabilities
- Text generation across multiple languages
- Code generation in 12 programming languages
- Information extraction and question answering
- Summarization tasks
- Cross-lingual understanding and generation
Frequently Asked Questions
Q: What makes this model unique?
BLOOM-1B7 stands out for its extensive language support and open-science approach. It's specifically designed to be accessible for public research while maintaining high performance across multiple languages and programming tasks.
Q: What are the recommended use cases?
The model is best suited for research purposes, text generation tasks, and as a pre-training base for specific downstream tasks. However, it should not be used for high-stakes decisions or critical applications where reliability is crucial.