sarvam-1

Maintained By
sarvamai

Sarvam-1

PropertyValue
Parameter Count2 billion
Model TypeLanguage Model (Text Completion)
LicenseSarvam non-commercial license
Training Infrastructure1,024 GPUs on Yotta Shakti cluster
FrameworkNVIDIA NeMo

What is sarvam-1?

Sarvam-1 is a groundbreaking 2-billion parameter language model specifically designed for Indian languages. Built using the NVIDIA NeMo Framework, it represents a significant advancement in multilingual AI, offering superior performance across 10 major Indian languages including Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.

Implementation Details

The model features a sophisticated architecture with 2048 hidden size, 28 hidden layers, and 16 attention heads. It utilizes SwiGLU activation function and Rotary positional embeddings, trained with grouped-query attention and bfloat16 mixed-precision. The model achieves impressive benchmark scores, including 44.44 on MMLU and 58.50 on ARC-Challenge in zero-shot settings.

  • Hidden size: 2048 with 11,008 intermediate size
  • 28 hidden layers with 16 attention heads
  • Maximum position embeddings: 8,192
  • Training completed in 5 days on 1,024 GPUs

Core Capabilities

  • Superior token efficiency with fertility rates of 1.4-2.1 across supported languages
  • 4-6x faster inference compared to larger models
  • Trained on ~4 trillion tokens with 2 trillion high-quality Indic tokens
  • Competitive performance against larger models like Llama-3.1-8B

Frequently Asked Questions

Q: What makes this model unique?

Sarvam-1's unique strength lies in its specialized optimization for Indian languages, offering unprecedented token efficiency and performance while maintaining a relatively compact 2B parameter size. It's specifically designed for text completion tasks and serves as an excellent base model for fine-tuning.

Q: What are the recommended use cases?

The model is primarily intended for text completion tasks and should be fine-tuned for specific downstream applications. It excels in tasks involving Indian languages and can be particularly effective for translation, summarization, and general text generation in supported Indic languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.