Sharif-wav2vec2

Maintained By
SLPL

Sharif-wav2vec2

PropertyValue
Model TypeSpeech Recognition (ASR)
LanguageFarsi/Persian
Training Data108 hours CommonVoice
Model HubHugging Face
Best WER6% (cleaned test set)

What is Sharif-wav2vec2?

Sharif-wav2vec2 is a specialized automatic speech recognition (ASR) model designed specifically for the Farsi language. Built upon the wav2vec2 architecture, this model has been fine-tuned using 108 hours of CommonVoice Farsi audio samples, incorporating a sophisticated 5gram language model created with the KenLM toolkit to enhance transcription accuracy.

Implementation Details

The model operates on 16kHz audio input and leverages the power of both CTC (Connectionist Temporal Classification) and language modeling for improved transcription accuracy. The implementation combines wav2vec2's powerful feature extraction capabilities with a custom language model to achieve state-of-the-art performance for Farsi ASR.

  • 16kHz sampling rate requirement
  • Integrated 5gram KenLM language model
  • Batch processing support
  • Attention mask functionality
  • CTC-based decoding with language model integration

Core Capabilities

  • Highly accurate Farsi speech recognition (6% WER on cleaned test sets)
  • Robust performance across various speech contexts (16% WER on other test sets)
  • Efficient batch processing for multiple audio files
  • Simple integration with the Transformers library
  • Support for both local and API-based inference

Frequently Asked Questions

Q: What makes this model unique?

The model's integration of a 5gram KenLM language model with wav2vec2's acoustic modeling creates a powerful combination specifically optimized for Farsi speech recognition, achieving impressive accuracy metrics on CommonVoice datasets.

Q: What are the recommended use cases?

This model is ideal for Farsi speech transcription tasks, particularly in applications requiring high accuracy and real-time processing. It's suitable for both batch processing and single-file transcription scenarios, making it versatile for various ASR applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.