wav2vec2-xls-r-300m-cv7-turkish
Property | Value |
---|---|
License | CC-BY-4.0 |
Language | Turkish |
Downloads | 281,435 |
Framework | PyTorch |
What is wav2vec2-xls-r-300m-cv7-turkish?
This is a specialized automatic speech recognition (ASR) model fine-tuned for the Turkish language, based on facebook's wav2vec2-xls-r-300m architecture. The model demonstrates impressive performance with a Word Error Rate (WER) of 8.62% on the Common Voice 7 test set, making it particularly effective for Turkish speech recognition tasks.
Implementation Details
The model was trained using a combination of Common Voice 7.0 TR and MediaSpeech datasets, incorporating custom preprocessing steps and specialized Turkish text processing. It utilizes an N-gram language model trained on Turkish Wikipedia articles using KenLM, enhancing its recognition capabilities.
- Built on PyTorch 1.10.1 and Transformers 4.16.0
- Implements feature extractor freezing and strategic dropout rates
- Utilizes custom Turkish text processing via unicode_tr package
Core Capabilities
- High-accuracy Turkish speech recognition (2.26% Character Error Rate)
- Optimized for both short and long-form speech input
- Supports chunked processing with configurable stride lengths
- Robust performance across varied speech conditions
Frequently Asked Questions
Q: What makes this model unique?
The model combines state-of-the-art wav2vec2 architecture with specialized Turkish language adaptations, including custom preprocessing and an N-gram language model trained on Turkish Wikipedia, achieving superior performance on Turkish ASR tasks.
Q: What are the recommended use cases?
This model is ideal for Turkish speech recognition applications, particularly in scenarios requiring high accuracy such as transcription services, voice command systems, and automated subtitle generation for Turkish content.