wav2vec2-large-xlsr-persian-v2
Property | Value |
---|---|
Model Type | Speech Recognition |
Base Architecture | Wav2Vec2-Large-XLSR-53 |
Language | Persian (Farsi) |
Performance | 31.92% WER on Common Voice test set |
Author | m3hrdadfi |
Model Link | Hugging Face |
What is wav2vec2-large-xlsr-persian-v2?
This model is a fine-tuned version of Facebook's wav2vec2-large-xlsr-53 specifically optimized for Persian language speech recognition. It represents a significant advancement in Persian automatic speech recognition (ASR) technology, designed to process 16kHz audio input and convert it to text with high accuracy.
Implementation Details
The model leverages the Wav2Vec2 architecture and has been trained on the Common Voice Persian dataset. It includes comprehensive text normalization and preprocessing specific to Persian text, handling various character mappings and special cases unique to the Persian language.
- Built on wav2vec2-large-xlsr-53 architecture
- Includes specialized Persian text normalization
- Processes 16kHz audio input
- Implements CTC-based speech recognition
Core Capabilities
- Direct speech-to-text transcription without language model
- Handles Persian-specific character variations and mappings
- Robust performance on natural speech input
- Comprehensive text normalization pipeline
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Persian language speech recognition with a comprehensive character mapping system and specialized text normalization pipeline. It achieves a WER of 31.92% on the Common Voice test set, making it a robust solution for Persian ASR tasks.
Q: What are the recommended use cases?
The model is ideal for Persian speech recognition applications including transcription services, voice assistants, and automated subtitling. It's particularly suited for applications requiring 16kHz audio input and can handle various Persian dialects and accents.