BLOOM Intermediate Model
Property | Value |
---|---|
Parameters | 176 Billion |
License | RAIL License v1.0 |
Developer | BigScience |
Languages | 45 natural languages + 12 programming languages |
What is bloom-intermediate?
BLOOM-intermediate represents the checkpoint versions of the massive BLOOM language model, offering researchers and developers access to various training stages of this groundbreaking multilingual model. These checkpoints, ranging from steps 5000 to 93000, provide valuable insights into the model's training progression and development.
Implementation Details
The model features a decoder-only architecture with 70 layers and 112 attention heads. It implements several advanced techniques including ALiBI positional encodings and StableEmbedding layer normalization. Training was conducted on the Jean Zay supercomputer using 384 A100 80GB GPUs.
- Hidden layer dimension: 14336
- Sequence length: 2048 tokens
- Vocabulary size: 250,680 tokens
- Training infrastructure: Megatron-DeepSpeed with PyTorch
Core Capabilities
- Multilingual text generation across 45 natural languages
- Code generation in 12 programming languages
- Base model for fine-tuning on specific tasks
- Research-oriented applications in NLP
Frequently Asked Questions
Q: What makes this model unique?
BLOOM-intermediate is unique in providing access to training checkpoints of one of the largest open-source multilingual models, allowing researchers to study model evolution during training. It's specifically designed for public research and non-commercial applications.
Q: What are the recommended use cases?
The model is recommended for research purposes, including studying model training dynamics, exploring language model behavior, and developing downstream applications in areas like information extraction and question answering. However, it should not be used for high-stakes decisions or critical applications.