librispeech_100_e_branchformer

Maintained By
pyf98

LibriSpeech-100 E-Branchformer ASR Model

PropertyValue
Authorpyf98
LicenseCC-BY-4.0
PaperE-Branchformer Paper
FrameworkESPnet

What is librispeech_100_e_branchformer?

This is an automatic speech recognition (ASR) model that implements the E-Branchformer architecture, trained on the LibriSpeech-100 dataset. The model combines parallel MLP and attention mechanisms to effectively capture both local and global context in speech recognition tasks.

Implementation Details

The model uses a sophisticated architecture with 12 E-Branchformer blocks, each containing attention heads and MLP layers. Key specifications include: output size of 256, 4 attention heads, and 1024 linear units. The model employs both CTC and attention-based decoding with a CTC weight of 0.3.

  • Encoder: E-Branchformer with 12 blocks
  • Decoder: Transformer with 6 blocks
  • Frontend: 512-point FFT with 400ms window
  • SpecAugment: Time warping and masking enabled

Core Capabilities

  • Achieves 94.4% accuracy on LibriSpeech test-clean
  • 85.0% accuracy on more challenging test-other set
  • Effective handling of both clean and noisy speech
  • Robust performance across different speaking styles

Frequently Asked Questions

Q: What makes this model unique?

The E-Branchformer architecture uniquely combines MLP and attention mechanisms in parallel, enhanced with a specialized merging strategy for better speech recognition performance.

Q: What are the recommended use cases?

This model is ideal for English speech recognition tasks, particularly in scenarios requiring high accuracy on clean speech while maintaining reasonable performance on more challenging audio conditions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.