Sharif-wav2vec2
Property | Value |
---|---|
Model Type | Speech Recognition (ASR) |
Language | Farsi/Persian |
Training Data | 108 hours CommonVoice |
Model Hub | Hugging Face |
Best WER | 6% (cleaned test set) |
What is Sharif-wav2vec2?
Sharif-wav2vec2 is a specialized automatic speech recognition (ASR) model designed specifically for the Farsi language. Built upon the wav2vec2 architecture, this model has been fine-tuned using 108 hours of CommonVoice Farsi audio samples, incorporating a sophisticated 5gram language model created with the KenLM toolkit to enhance transcription accuracy.
Implementation Details
The model operates on 16kHz audio input and leverages the power of both CTC (Connectionist Temporal Classification) and language modeling for improved transcription accuracy. The implementation combines wav2vec2's powerful feature extraction capabilities with a custom language model to achieve state-of-the-art performance for Farsi ASR.
- 16kHz sampling rate requirement
- Integrated 5gram KenLM language model
- Batch processing support
- Attention mask functionality
- CTC-based decoding with language model integration
Core Capabilities
- Highly accurate Farsi speech recognition (6% WER on cleaned test sets)
- Robust performance across various speech contexts (16% WER on other test sets)
- Efficient batch processing for multiple audio files
- Simple integration with the Transformers library
- Support for both local and API-based inference
Frequently Asked Questions
Q: What makes this model unique?
The model's integration of a 5gram KenLM language model with wav2vec2's acoustic modeling creates a powerful combination specifically optimized for Farsi speech recognition, achieving impressive accuracy metrics on CommonVoice datasets.
Q: What are the recommended use cases?
This model is ideal for Farsi speech transcription tasks, particularly in applications requiring high accuracy and real-time processing. It's suitable for both batch processing and single-file transcription scenarios, making it versatile for various ASR applications.