Sarvam-1
Property | Value |
---|---|
Parameter Count | 2 billion |
Model Type | Language Model (Text Completion) |
License | Sarvam non-commercial license |
Training Infrastructure | 1,024 GPUs on Yotta Shakti cluster |
Framework | NVIDIA NeMo |
What is sarvam-1?
Sarvam-1 is a groundbreaking 2-billion parameter language model specifically designed for Indian languages. Built using the NVIDIA NeMo Framework, it represents a significant advancement in multilingual AI, offering superior performance across 10 major Indian languages including Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.
Implementation Details
The model features a sophisticated architecture with 2048 hidden size, 28 hidden layers, and 16 attention heads. It utilizes SwiGLU activation function and Rotary positional embeddings, trained with grouped-query attention and bfloat16 mixed-precision. The model achieves impressive benchmark scores, including 44.44 on MMLU and 58.50 on ARC-Challenge in zero-shot settings.
- Hidden size: 2048 with 11,008 intermediate size
- 28 hidden layers with 16 attention heads
- Maximum position embeddings: 8,192
- Training completed in 5 days on 1,024 GPUs
Core Capabilities
- Superior token efficiency with fertility rates of 1.4-2.1 across supported languages
- 4-6x faster inference compared to larger models
- Trained on ~4 trillion tokens with 2 trillion high-quality Indic tokens
- Competitive performance against larger models like Llama-3.1-8B
Frequently Asked Questions
Q: What makes this model unique?
Sarvam-1's unique strength lies in its specialized optimization for Indian languages, offering unprecedented token efficiency and performance while maintaining a relatively compact 2B parameter size. It's specifically designed for text completion tasks and serves as an excellent base model for fine-tuning.
Q: What are the recommended use cases?
The model is primarily intended for text completion tasks and should be fine-tuned for specific downstream applications. It excels in tasks involving Indian languages and can be particularly effective for translation, summarization, and general text generation in supported Indic languages.