wavlm-libri-clean-100h-base-plus

Maintained By
patrickvonplaten

wavlm-libri-clean-100h-base-plus

PropertyValue
Downloads769,579
FrameworkPyTorch 1.9.0
Training DataLibriSpeech ASR - CLEAN
Best WER6.83%

What is wavlm-libri-clean-100h-base-plus?

This is a fine-tuned version of Microsoft's WavLM-base-plus model, specifically optimized for speech recognition tasks using the LibriSpeech ASR clean dataset. The model demonstrates impressive performance with a Word Error Rate (WER) of just 6.83% on the evaluation set.

Implementation Details

The model utilizes a sophisticated training setup with multi-GPU distribution across 8 devices, implementing Native AMP (Automatic Mixed Precision) training. It employs the Adam optimizer with carefully tuned parameters (betas=0.9,0.999, epsilon=1e-08) and a linear learning rate scheduler with 500 warmup steps.

  • Total batch size: 32 (4 per GPU)
  • Learning rate: 0.0003
  • Training duration: 3 epochs
  • Validation loss: 0.0819

Core Capabilities

  • Automatic Speech Recognition
  • Multi-GPU training support
  • TensorBoard integration
  • Inference endpoint compatibility

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient training progression, showing significant improvement from an initial WER of 100% to a final 6.83%, achieved through careful optimization and the use of advanced training techniques like Native AMP.

Q: What are the recommended use cases?

The model is specifically designed for clean speech recognition tasks, making it ideal for applications requiring high-accuracy transcription of clear audio input, such as audiobook transcription, meeting recordings, and other controlled environment speech recognition scenarios.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.