Phi-4-mm-inst-asr-turkish-3
Property | Value |
---|---|
Base Model | microsoft/Phi-4-multimodal-instruct |
Author | ysdede |
Training Data | 1300 hours of Turkish audio |
Model Hub | Hugging Face |
What is Phi-4-mm-inst-asr-turkish-3?
Phi-4-mm-inst-asr-turkish-3 is a specialized automatic speech recognition (ASR) model fine-tuned specifically for Turkish language processing. Built upon Microsoft's Phi-4-multimodal-instruct architecture, this model demonstrates significant improvements in Turkish speech recognition capabilities, showing remarkable progress in both Word Error Rate (WER) and Character Error Rate (CER) metrics.
Implementation Details
The model employs a language-specific fine-tuning approach, utilizing prompts like "Transcribe the Turkish audio" for optimal performance. The implementation requires loading generation configuration and processor from the base model to ensure proper functionality during inference.
- Improved WER from 153.84 to 64.76 after fine-tuning
- Reduced CER from 82.57 to 29.85 post-training
- Supports both language-agnostic and Turkish-specific prompting
Core Capabilities
- Specialized Turkish speech recognition
- Flexible prompt handling for optimal performance
- Significant reduction in error rates compared to base model
- Compatible with standard ASR workflows
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines the power of Phi-4-multimodal-instruct with specialized Turkish language capabilities, achieving substantial improvements in ASR performance through targeted fine-tuning on a large Turkish audio dataset.
Q: What are the recommended use cases?
The model is particularly suited for Turkish speech recognition tasks, transcription services, and applications requiring accurate Turkish audio-to-text conversion. It performs optimally when using language-specific prompting.