ALIA-40b

Maintained By
BSC-LT

ALIA-40b

PropertyValue
Parameter Count40.4B
ArchitectureDecoder-only Transformer
Context Length4,096 tokens
Training Data6.9T tokens
Languages35 European languages + code
LicenseApache 2.0

What is ALIA-40b?

ALIA-40b is a highly advanced multilingual language model developed by Barcelona Supercomputing Center's Language Technologies unit. It represents a significant advancement in multilingual AI, having been pre-trained from scratch on 6.9 trillion tokens across 35 European languages and code. The model particularly emphasizes Spanish co-official languages (Spanish, Catalan, Galician, and Basque) through strategic data sampling.

Implementation Details

The model employs a sophisticated architecture with 48 layers, 8,192 hidden size, and 64 attention heads. It utilizes modern efficiency techniques including Flash Attention and Grouped Query Attention with 8 query groups. The training was conducted on MareNostrum 5, a pre-exascale supercomputer, using NVIDIA's NeMo Framework.

  • Vocabulary size: 256,000 tokens
  • Precision: bfloat16
  • Embedding type: RoPE
  • Activation Function: SwiGLU
  • Layer normalization: RMS Norm

Core Capabilities

  • Multilingual text generation across 35 European languages
  • Programming language understanding and generation
  • Strong performance in Spanish co-official languages
  • Versatile applications from research to commercial use
  • Base model suitable for further fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

ALIA-40b stands out for its balanced multilingual capabilities, especially its enhanced performance in Spanish co-official languages. The model's training data was carefully curated and sampled to ensure proper representation of minority languages while maintaining strong performance across all supported languages.

Q: What are the recommended use cases?

The model is designed for both research and commercial applications in supported languages. As a base model, it's particularly well-suited for language generation tasks or further fine-tuning for specific use cases. However, it should not be used for malicious activities or in production environments without proper risk assessment.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.