farsi_commonvoice_blstm

Maintained By
espnet

Farsi CommonVoice BLSTM

PropertyValue
LicenseCC-BY-4.0
FrameworkESPnet
PaperESPnet: End-to-End Speech Processing Toolkit
LanguagePersian (Farsi)

What is farsi_commonvoice_blstm?

This is an automatic speech recognition (ASR) model designed specifically for the Persian language, built using the ESPnet framework and trained on the CommonVoice dataset. The model implements a bidirectional LSTM architecture with VGG-RNN encoding, achieving impressive accuracy metrics including a 91.4% Word Error Rate and 97.2% Character Error Rate on test data.

Implementation Details

The model utilizes a sophisticated architecture combining VGG features with bidirectional LSTM layers. Key technical specifications include:

  • 4 LSTM layers with 1024 hidden units
  • Bidirectional processing with projection layers
  • Location-based attention mechanism
  • SpecAugment data augmentation
  • Global MVN normalization
  • CTC weight of 0.5

Core Capabilities

  • Persian speech recognition with high accuracy
  • Robust performance with 3.6% CER on test data
  • Supports real-time transcription
  • Handles various Persian dialects and accents
  • Integrates seamlessly with ESPnet toolkit

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on Persian language ASR, utilizing a hybrid CTC-attention architecture with BLSTM layers, achieving state-of-the-art performance on the CommonVoice dataset.

Q: What are the recommended use cases?

The model is ideal for Persian speech transcription tasks, including automated subtitling, voice command systems, and speech-to-text applications requiring high accuracy in Farsi language processing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.