Phi-4-mm-inst-asr-turkish
Property | Value |
---|---|
Base Model | microsoft/Phi-4-multimodal-instruct |
Training Data | 600-hour Turkish audio dataset |
Author | ysdede |
Model Link | Hugging Face |
What is Phi-4-mm-inst-asr-turkish?
Phi-4-mm-inst-asr-turkish is a specialized fine-tuned version of Microsoft's Phi-4-multimodal-instruct model, specifically optimized for Turkish speech recognition. The model was trained on a substantial 600-hour Turkish audio dataset for one epoch, achieving significant improvements in speech recognition accuracy.
Implementation Details
The model employs a fine-tuning approach using the prompt "Transcribe the Turkish audio". It demonstrates remarkable improvement in performance metrics, with the Word Error Rate (WER) reducing from 127.29 to 47.57 and Character Error Rate (CER) improving from 78.22 to 20.52. The training loss showed significant improvement, decreasing from 1.423 to 0.176.
- Learning rate: 1e-05
- Batch size: 4 (training), 8 (evaluation)
- Optimizer: AdamW with betas=(0.9,0.95)
- Linear learning rate scheduler with 5000 warmup steps
- Native AMP mixed precision training
Core Capabilities
- Specialized Turkish speech recognition
- Improved accuracy with source language specification
- Reduced hallucination rates
- Significant WER and CER improvements
Frequently Asked Questions
Q: What makes this model unique?
The model's specialization in Turkish speech recognition and its significant performance improvements make it stand out. The reduction in WER by nearly 63% demonstrates its effectiveness for Turkish ASR tasks.
Q: What are the recommended use cases?
The model is specifically designed for Turkish speech transcription tasks. It performs best when the source language is specified during inference, making it ideal for applications requiring Turkish audio transcription.