wav2vec2-large-xlsr-53-persian

Maintained By
jonatasgrosman

wav2vec2-large-xlsr-53-persian

PropertyValue
LicenseApache 2.0
AuthorJonatas Grosman
Downloads297,168
TaskAutomatic Speech Recognition

What is wav2vec2-large-xlsr-53-persian?

This is a specialized speech recognition model fine-tuned on the Persian language using Facebook's wav2vec2-large-xlsr-53 architecture. The model was trained on Common Voice 6.1 dataset and achieves state-of-the-art performance with a Word Error Rate (WER) of 30.12% and Character Error Rate (CER) of 7.37%. It's specifically designed to process Persian speech input sampled at 16kHz.

Implementation Details

The model leverages the Wav2Vec2ForCTC architecture and includes a custom processor for handling Persian audio input. It's implemented using PyTorch and can be easily integrated using the HuggingSound library or custom inference scripts.

  • Built on facebook/wav2vec2-large-xlsr-53 base model
  • Optimized for 16kHz audio sampling rate
  • Includes comprehensive preprocessing pipeline
  • Supports batch processing for efficient inference

Core Capabilities

  • Direct speech-to-text transcription without language model
  • Handles various Persian dialects and accents
  • Supports both isolated word and continuous speech recognition
  • Efficient batch processing for multiple audio files

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its superior performance on Persian speech recognition, outperforming other available models with lower WER and CER rates. It's been extensively tested on the Common Voice dataset and provides robust performance across various speaking styles.

Q: What are the recommended use cases?

The model is ideal for Persian speech transcription tasks, including but not limited to automated subtitling, voice command systems, and speech analytics. It's particularly effective for applications requiring real-time transcription of Persian speech.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.