sarvam-1

sarvam-1

sarvamai

Sarvam-1 is a 2B parameter LLM optimized for 10 Indian languages, offering superior token efficiency and performance comparable to larger models like Llama-3.1-8B.

PropertyValue
Parameter Count2 billion
Model TypeLanguage Model (Text Completion)
LicenseSarvam non-commercial license
Training Infrastructure1,024 GPUs on Yotta Shakti cluster
FrameworkNVIDIA NeMo

What is sarvam-1?

Sarvam-1 is a groundbreaking 2-billion parameter language model specifically designed for Indian languages. Built using the NVIDIA NeMo Framework, it represents a significant advancement in multilingual AI, offering superior performance across 10 major Indian languages including Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.

Implementation Details

The model features a sophisticated architecture with 2048 hidden size, 28 hidden layers, and 16 attention heads. It utilizes SwiGLU activation function and Rotary positional embeddings, trained with grouped-query attention and bfloat16 mixed-precision. The model achieves impressive benchmark scores, including 44.44 on MMLU and 58.50 on ARC-Challenge in zero-shot settings.

  • Hidden size: 2048 with 11,008 intermediate size
  • 28 hidden layers with 16 attention heads
  • Maximum position embeddings: 8,192
  • Training completed in 5 days on 1,024 GPUs

Core Capabilities

  • Superior token efficiency with fertility rates of 1.4-2.1 across supported languages
  • 4-6x faster inference compared to larger models
  • Trained on ~4 trillion tokens with 2 trillion high-quality Indic tokens
  • Competitive performance against larger models like Llama-3.1-8B

Frequently Asked Questions

Q: What makes this model unique?

Sarvam-1's unique strength lies in its specialized optimization for Indian languages, offering unprecedented token efficiency and performance while maintaining a relatively compact 2B parameter size. It's specifically designed for text completion tasks and serves as an excellent base model for fine-tuning.

Q: What are the recommended use cases?

The model is primarily intended for text completion tasks and should be fine-tuned for specific downstream applications. It excels in tasks involving Indian languages and can be particularly effective for translation, summarization, and general text generation in supported Indic languages.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026