whisper-SV
Property | Value |
---|---|
License | Apache 2.0 |
Framework | PyTorch 1.13.0 |
Language | Swedish |
What is whisper-SV?
Whisper-SV is a specialized Swedish speech recognition model that builds upon OpenAI's whisper-small architecture. It has been specifically fine-tuned on the Common Voice 11.0 dataset to enhance its performance for Swedish language processing. The model leverages the Transformers library and implements native AMP (Automatic Mixed Precision) training for optimal performance.
Implementation Details
The model was trained using carefully selected hyperparameters, including a learning rate of 1e-05 and a total train batch size of 16. Training utilized the Adam optimizer with betas=(0.9,0.999) and implemented a linear learning rate scheduler with 500 warmup steps.
- Gradient accumulation steps: 2
- Training steps: 200
- Evaluation batch size: 8
- Seed: 42
Core Capabilities
- Swedish speech recognition
- Integration with HuggingFace's ASR pipeline
- Support for TensorBoard logging
- Inference endpoint compatibility
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in Swedish language processing, being fine-tuned specifically on Swedish speech data while leveraging the robust whisper-small architecture.
Q: What are the recommended use cases?
The model is particularly suited for Swedish automatic speech recognition tasks, transcription services, and applications requiring Swedish language audio processing.