Farsi CommonVoice BLSTM
Property | Value |
---|---|
License | CC-BY-4.0 |
Framework | ESPnet |
Paper | ESPnet: End-to-End Speech Processing Toolkit |
Language | Persian (Farsi) |
What is farsi_commonvoice_blstm?
This is an automatic speech recognition (ASR) model designed specifically for the Persian language, built using the ESPnet framework and trained on the CommonVoice dataset. The model implements a bidirectional LSTM architecture with VGG-RNN encoding, achieving impressive accuracy metrics including a 91.4% Word Error Rate and 97.2% Character Error Rate on test data.
Implementation Details
The model utilizes a sophisticated architecture combining VGG features with bidirectional LSTM layers. Key technical specifications include:
- 4 LSTM layers with 1024 hidden units
- Bidirectional processing with projection layers
- Location-based attention mechanism
- SpecAugment data augmentation
- Global MVN normalization
- CTC weight of 0.5
Core Capabilities
- Persian speech recognition with high accuracy
- Robust performance with 3.6% CER on test data
- Supports real-time transcription
- Handles various Persian dialects and accents
- Integrates seamlessly with ESPnet toolkit
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized focus on Persian language ASR, utilizing a hybrid CTC-attention architecture with BLSTM layers, achieving state-of-the-art performance on the CommonVoice dataset.
Q: What are the recommended use cases?
The model is ideal for Persian speech transcription tasks, including automated subtitling, voice command systems, and speech-to-text applications requiring high accuracy in Farsi language processing.