wav2vec2-large-robust-ft-libritts-voxpopuli

Maintained By
jbetker

wav2vec2-large-robust-ft-libritts-voxpopuli

PropertyValue
Authorjbetker
Downloads632,156
Architecturewav2vec2-large
Base Modelfacebook/wav2vec2-large-robust-ft-libri-960h

What is wav2vec2-large-robust-ft-libritts-voxpopuli?

This is a specialized speech recognition model built on the wav2vec2-large architecture, specifically designed for generating transcriptions with punctuation. It's a fine-tuned version of the Facebook wav2vec2 model, trained on both LibriTTS and VoxPopuli datasets to achieve superior punctuation awareness in speech transcription.

Implementation Details

The model is built upon the robust wav2vec2-large architecture and achieves a Word Error Rate (WER) of 4.45% on the LibriSpeech validation set, coming close to its baseline model's 4.3%. It incorporates a custom vocabulary that includes punctuation marks, making it particularly valuable for Text-to-Speech (TTS) applications.

  • Fine-tuned on clean audio from LibriTTS and VoxPopuli datasets
  • Custom vocabulary with punctuation support
  • Compatible with the Transformers library and PyTorch
  • Optimized for clean audio processing

Core Capabilities

  • High-accuracy speech transcription with punctuation
  • Excellent performance on clean audio sources
  • Specialized for TTS model training
  • Robust performance with 4.45% WER on LibriSpeech

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to generate transcriptions with accurate punctuation, which is crucial for TTS applications. The custom vocabulary and specialized training on LibriTTS and VoxPopuli datasets make it particularly effective for clean audio transcription tasks.

Q: What are the recommended use cases?

The model is best suited for: 1) Generating transcriptions for TTS model training, 2) Clean audio transcription tasks requiring punctuation, 3) Applications where prosody and punctuation accuracy are crucial. Note that it may not perform optimally on noisy audio sources.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.