wavlm-libri-clean-100h-base-plus

Property	Value
Downloads	769,579
Framework	PyTorch 1.9.0
Training Data	LibriSpeech ASR - CLEAN
Best WER	6.83%

What is wavlm-libri-clean-100h-base-plus?

This is a fine-tuned version of Microsoft's WavLM-base-plus model, specifically optimized for speech recognition tasks using the LibriSpeech ASR clean dataset. The model demonstrates impressive performance with a Word Error Rate (WER) of just 6.83% on the evaluation set.

Implementation Details

The model utilizes a sophisticated training setup with multi-GPU distribution across 8 devices, implementing Native AMP (Automatic Mixed Precision) training. It employs the Adam optimizer with carefully tuned parameters (betas=0.9,0.999, epsilon=1e-08) and a linear learning rate scheduler with 500 warmup steps.

Total batch size: 32 (4 per GPU)
Learning rate: 0.0003
Training duration: 3 epochs
Validation loss: 0.0819

Core Capabilities

Automatic Speech Recognition
Multi-GPU training support
TensorBoard integration
Inference endpoint compatibility

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient training progression, showing significant improvement from an initial WER of 100% to a final 6.83%, achieved through careful optimization and the use of advanced training techniques like Native AMP.

Q: What are the recommended use cases?

The model is specifically designed for clean speech recognition tasks, making it ideal for applications requiring high-accuracy transcription of clear audio input, such as audiobook transcription, meeting recordings, and other controlled environment speech recognition scenarios.