wavlm-libri-clean-100h-base-plus
Property | Value |
---|---|
Downloads | 769,579 |
Framework | PyTorch 1.9.0 |
Training Data | LibriSpeech ASR - CLEAN |
Best WER | 6.83% |
What is wavlm-libri-clean-100h-base-plus?
This is a fine-tuned version of Microsoft's WavLM-base-plus model, specifically optimized for speech recognition tasks using the LibriSpeech ASR clean dataset. The model demonstrates impressive performance with a Word Error Rate (WER) of just 6.83% on the evaluation set.
Implementation Details
The model utilizes a sophisticated training setup with multi-GPU distribution across 8 devices, implementing Native AMP (Automatic Mixed Precision) training. It employs the Adam optimizer with carefully tuned parameters (betas=0.9,0.999, epsilon=1e-08) and a linear learning rate scheduler with 500 warmup steps.
- Total batch size: 32 (4 per GPU)
- Learning rate: 0.0003
- Training duration: 3 epochs
- Validation loss: 0.0819
Core Capabilities
- Automatic Speech Recognition
- Multi-GPU training support
- TensorBoard integration
- Inference endpoint compatibility
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient training progression, showing significant improvement from an initial WER of 100% to a final 6.83%, achieved through careful optimization and the use of advanced training techniques like Native AMP.
Q: What are the recommended use cases?
The model is specifically designed for clean speech recognition tasks, making it ideal for applications requiring high-accuracy transcription of clear audio input, such as audiobook transcription, meeting recordings, and other controlled environment speech recognition scenarios.