ALIA-40b
Property | Value |
---|---|
Parameter Count | 40.4B |
Architecture | Decoder-only Transformer |
Context Length | 4,096 tokens |
Training Data | 6.9T tokens |
Languages | 35 European languages + code |
License | Apache 2.0 |
What is ALIA-40b?
ALIA-40b is a highly advanced multilingual language model developed by Barcelona Supercomputing Center's Language Technologies unit. It represents a significant advancement in multilingual AI, having been pre-trained from scratch on 6.9 trillion tokens across 35 European languages and code. The model particularly emphasizes Spanish co-official languages (Spanish, Catalan, Galician, and Basque) through strategic data sampling.
Implementation Details
The model employs a sophisticated architecture with 48 layers, 8,192 hidden size, and 64 attention heads. It utilizes modern efficiency techniques including Flash Attention and Grouped Query Attention with 8 query groups. The training was conducted on MareNostrum 5, a pre-exascale supercomputer, using NVIDIA's NeMo Framework.
- Vocabulary size: 256,000 tokens
- Precision: bfloat16
- Embedding type: RoPE
- Activation Function: SwiGLU
- Layer normalization: RMS Norm
Core Capabilities
- Multilingual text generation across 35 European languages
- Programming language understanding and generation
- Strong performance in Spanish co-official languages
- Versatile applications from research to commercial use
- Base model suitable for further fine-tuning
Frequently Asked Questions
Q: What makes this model unique?
ALIA-40b stands out for its balanced multilingual capabilities, especially its enhanced performance in Spanish co-official languages. The model's training data was carefully curated and sampled to ensure proper representation of minority languages while maintaining strong performance across all supported languages.
Q: What are the recommended use cases?
The model is designed for both research and commercial applications in supported languages. As a base model, it's particularly well-suited for language generation tasks or further fine-tuning for specific use cases. However, it should not be used for malicious activities or in production environments without proper risk assessment.