wavlm-libri-clean-100h-base-plus

wavlm-libri-clean-100h-base-plus

patrickvonplaten

WavLM speech recognition model fine-tuned on LibriSpeech ASR dataset, achieving 6.83% WER. Features multi-GPU training and linear learning rate scheduling with Native AMP.

PropertyValue
Downloads769,579
FrameworkPyTorch 1.9.0
Training DataLibriSpeech ASR - CLEAN
Best WER6.83%

What is wavlm-libri-clean-100h-base-plus?

This is a fine-tuned version of Microsoft's WavLM-base-plus model, specifically optimized for speech recognition tasks using the LibriSpeech ASR clean dataset. The model demonstrates impressive performance with a Word Error Rate (WER) of just 6.83% on the evaluation set.

Implementation Details

The model utilizes a sophisticated training setup with multi-GPU distribution across 8 devices, implementing Native AMP (Automatic Mixed Precision) training. It employs the Adam optimizer with carefully tuned parameters (betas=0.9,0.999, epsilon=1e-08) and a linear learning rate scheduler with 500 warmup steps.

  • Total batch size: 32 (4 per GPU)
  • Learning rate: 0.0003
  • Training duration: 3 epochs
  • Validation loss: 0.0819

Core Capabilities

  • Automatic Speech Recognition
  • Multi-GPU training support
  • TensorBoard integration
  • Inference endpoint compatibility

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient training progression, showing significant improvement from an initial WER of 100% to a final 6.83%, achieved through careful optimization and the use of advanced training techniques like Native AMP.

Q: What are the recommended use cases?

The model is specifically designed for clean speech recognition tasks, making it ideal for applications requiring high-accuracy transcription of clear audio input, such as audiobook transcription, meeting recordings, and other controlled environment speech recognition scenarios.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026