asr-whisper-large-v2-commonvoice-mn

Maintained By
speechbrain

ASR Whisper Large-v2 CommonVoice Mongolian

PropertyValue
LicenseApache 2.0
FrameworkPyTorch / SpeechBrain
Test WER64.92%
Test CER25.73%

What is asr-whisper-large-v2-commonvoice-mn?

This is a specialized automatic speech recognition (ASR) model designed for the Mongolian language, built on OpenAI's Whisper Large-v2 architecture and fine-tuned using the CommonVoice dataset. The model represents a significant effort in expanding language support for ASR technology to less-common languages.

Implementation Details

The model employs a sophisticated architecture where the pretrained Whisper-large-v2 encoder remains frozen while the decoder is fine-tuned specifically for Mongolian speech recognition. It utilizes the original Whisper tokenizer and processes audio at 16kHz sampling rate with single-channel input.

  • Frozen pretrained Whisper-large-v2 encoder
  • Fine-tuned decoder architecture
  • Integrated Whisper tokenizer
  • Automatic audio normalization capabilities

Core Capabilities

  • Mongolian speech recognition with 64.92% WER
  • Automatic audio preprocessing and normalization
  • GPU-compatible inference
  • Support for 16kHz audio processing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on Mongolian language ASR, utilizing the powerful Whisper Large-v2 architecture while maintaining the original encoder's knowledge through freezing, allowing for efficient fine-tuning on the target language.

Q: What are the recommended use cases?

The model is specifically designed for Mongolian speech recognition tasks, ideal for applications requiring transcription of Mongolian audio content. It's particularly suitable for scenarios where 16kHz audio input is available and GPU resources can be utilized for inference.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.