BLOOM-7B1 Language Model
Property | Value |
---|---|
Parameters | 7.07B |
License | RAIL 1.0 |
Languages | 46+ (including 45 natural languages) |
Training Data | 1.5TB of text |
Architecture | Decoder-only Transformer |
What is BLOOM-7B1?
BLOOM-7B1 is a state-of-the-art multilingual language model developed by BigScience, representing a significant achievement in open-science AI development. With 7.07 billion parameters, it's designed to democratize access to large language models while supporting an impressive array of 46+ languages, including many low-resource languages from Africa and India.
Implementation Details
The model utilizes a modified Megatron-LM GPT2 architecture with several innovative features. It employs 30 layers with 32 attention heads, operates with hidden layers of 4096 dimensions, and uses a sequence length of 2048 tokens. The implementation includes ALiBI positional encodings and GeLU activation functions.
- Trained on Jean Zay Supercomputer using 384 A100 80GB GPUs
- Uses a byte-level BPE tokenizer with 250,680 vocabulary size
- Implements stable embeddings with layer normalization
- Optimized using cross-entropy loss with mean reduction
Core Capabilities
- Multilingual text generation across 46+ languages
- Code generation in 12 programming languages
- Natural language understanding and generation
- Transfer learning base for fine-tuning
- Research and exploration of language model behaviors
Frequently Asked Questions
Q: What makes this model unique?
BLOOM-7B1 stands out for its extensive language coverage, including many low-resource languages, and its open-science development approach. It was trained using sustainable computing resources and is freely available for research purposes.
Q: What are the recommended use cases?
The model is best suited for research purposes, text generation tasks, and as a base model for fine-tuning. It's specifically designed for non-commercial use and should not be used for high-stakes decisions or critical applications.