shuka_v1

Maintained By
sarvamai

Shuka v1

PropertyValue
Authorsarvamai
Model TypeAudio Language Model
ArchitectureEncoder-Decoder with Projector
Model URLHuggingFace

What is shuka_v1?

Shuka v1 is an innovative language model specifically designed for understanding audio in Indic languages. It combines a state-of-the-art Saaras v1 audio encoder with Meta's Llama3-8B-Instruct decoder, connected through a lightweight projector containing approximately 60M parameters. The model demonstrates impressive efficiency, requiring less than 100 hours of audio for training.

Implementation Details

The model employs a unique training approach where only the projector weights are fine-tuned while keeping the encoder and decoder frozen. Despite being trained primarily on English and Hindi data, the model exhibits strong zero-shot performance across multiple Indic languages.

  • Efficient training methodology using only projector fine-tuning
  • Integration with popular libraries like transformers and librosa
  • Support for bfloat16 precision
  • 16kHz audio sampling rate requirement

Core Capabilities

  • Native audio understanding in 11+ Indic languages
  • Zero-shot performance in Bengali, Gujarati, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu
  • Natural and informative responses to audio queries
  • Efficient processing with minimal parameter tuning

Frequently Asked Questions

Q: What makes this model unique?

Shuka v1's uniqueness lies in its ability to understand multiple Indic languages without explicit training, achieved through its innovative architecture combining Saaras v1 encoder and Llama3-8B-Instruct decoder.

Q: What are the recommended use cases?

The model is ideal for audio question-answering tasks in Indic languages, multilingual audio understanding, and natural language processing applications requiring audio input in South Asian languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.