s2t-medium-mustc-multilingual-st

Maintained By
facebook

s2t-medium-mustc-multilingual-st

PropertyValue
AuthorFacebook
LicenseMIT
PaperResearch Paper
Supported LanguagesEnglish, German, Dutch, Spanish, French, Italian, Portuguese, Romanian, Russian

What is s2t-medium-mustc-multilingual-st?

s2t-medium-mustc-multilingual-st is a Speech to Text Transformer (S2T) model designed for end-to-end multilingual speech translation. Developed by Facebook, this model can translate English speech directly into text in 8 different European languages. It utilizes a transformer-based sequence-to-sequence architecture with special optimizations for speech processing.

Implementation Details

The model employs a convolutional downsampler that reduces speech input length by 75% before processing through the encoder. It's trained on the MuST-C dataset, which contains hundreds of hours of TED Talk recordings with corresponding translations.

  • Uses 80-channel log mel-filter bank features for speech processing
  • Implements SpecAugment for improved robustness
  • Utilizes a 10,000-size SentencePiece vocabulary
  • Supports autoregressive generation with forced language ID tokens

Core Capabilities

  • Direct speech-to-text translation for 9 language pairs
  • Strong BLEU scores ranging from 16.0 (En-Ru) to 34.9 (En-Fr)
  • Efficient processing through convolutional downsampling
  • Support for utterance-level CMVN normalization

Frequently Asked Questions

Q: What makes this model unique?

This model's ability to perform direct speech-to-text translation in multiple languages without intermediate transcription makes it particularly valuable. The pre-training on multilingual ASR tasks and impressive BLEU scores for various language pairs demonstrate its robust performance.

Q: What are the recommended use cases?

The model is ideal for translating English speech content into multiple European languages, particularly useful for processing TED Talks, educational content, and other spoken presentations. It's especially effective for French (34.9 BLEU) and Portuguese (31.1 BLEU) translations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.