s2t-medium-mustc-multilingual-st
Property | Value |
---|---|
Author | |
License | MIT |
Paper | Research Paper |
Supported Languages | English, German, Dutch, Spanish, French, Italian, Portuguese, Romanian, Russian |
What is s2t-medium-mustc-multilingual-st?
s2t-medium-mustc-multilingual-st is a Speech to Text Transformer (S2T) model designed for end-to-end multilingual speech translation. Developed by Facebook, this model can translate English speech directly into text in 8 different European languages. It utilizes a transformer-based sequence-to-sequence architecture with special optimizations for speech processing.
Implementation Details
The model employs a convolutional downsampler that reduces speech input length by 75% before processing through the encoder. It's trained on the MuST-C dataset, which contains hundreds of hours of TED Talk recordings with corresponding translations.
- Uses 80-channel log mel-filter bank features for speech processing
- Implements SpecAugment for improved robustness
- Utilizes a 10,000-size SentencePiece vocabulary
- Supports autoregressive generation with forced language ID tokens
Core Capabilities
- Direct speech-to-text translation for 9 language pairs
- Strong BLEU scores ranging from 16.0 (En-Ru) to 34.9 (En-Fr)
- Efficient processing through convolutional downsampling
- Support for utterance-level CMVN normalization
Frequently Asked Questions
Q: What makes this model unique?
This model's ability to perform direct speech-to-text translation in multiple languages without intermediate transcription makes it particularly valuable. The pre-training on multilingual ASR tasks and impressive BLEU scores for various language pairs demonstrate its robust performance.
Q: What are the recommended use cases?
The model is ideal for translating English speech content into multiple European languages, particularly useful for processing TED Talks, educational content, and other spoken presentations. It's especially effective for French (34.9 BLEU) and Portuguese (31.1 BLEU) translations.