sew-d-tiny-100k-ft-ls100h

Maintained By
asapp

SEW-D Tiny Speech Recognition Model

PropertyValue
Parameter Count24.1M
LicenseApache 2.0
PaperPerformance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
WER (LibriSpeech Clean)10.47%
WER (LibriSpeech Other)22.73%

What is sew-d-tiny-100k-ft-ls100h?

SEW-D tiny is an efficient speech recognition model developed by ASAPP Research that represents a significant advancement in balancing performance and computational efficiency. The model is designed for 16kHz sampled speech audio and has been specifically optimized to provide faster inference while maintaining high accuracy in speech recognition tasks.

Implementation Details

The model implements the SEW (Squeezed and Efficient Wav2vec) architecture, achieving a 1.9x inference speedup compared to wav2vec 2.0. It uses a CTC-based approach for speech recognition and is implemented using PyTorch, with model weights stored in safetensors format.

  • Optimized for 16kHz audio input
  • Built on transformer architecture
  • Uses CTC loss for sequence modeling
  • Supports batch processing for efficient inference

Core Capabilities

  • Automatic Speech Recognition with competitive WER
  • Efficient inference with reduced computational overhead
  • Support for English language processing
  • Fine-tuning capability for downstream tasks

Frequently Asked Questions

Q: What makes this model unique?

SEW-D tiny stands out for its excellent balance between performance and efficiency, achieving a 13.5% relative reduction in word error rate while providing significantly faster inference compared to traditional models.

Q: What are the recommended use cases?

The model is ideal for automatic speech recognition tasks, particularly when computational efficiency is important. It can be fine-tuned for specific applications including speaker identification, intent classification, and emotion recognition.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.