xm_transformer_s2ut_en-hk

facebook

A speech-to-speech translation model specialized in English-to-Hokkien conversion, built on fairseq framework with direct translation capabilities and TED domain training.

Property	Value
License	CC-BY-NC-4.0
Framework	Fairseq
Task Type	Speech-to-Speech Translation
Dataset	MuST-C

What is xm_transformer_s2ut_en-hk?

The xm_transformer_s2ut_en-hk is a specialized speech-to-speech translation model designed to convert English speech directly into Hokkien speech. Built by Facebook using the Fairseq framework, it implements a single-pass decoder (S2UT) architecture for efficient translation. The model has been trained on both supervised TED domain data and weakly supervised data from TED and Audiobook domains.

Implementation Details

This model utilizes a sophisticated pipeline that combines speech recognition and translation into a single process. It integrates with the facebook/unit_hifigan_HK_layer12.km2500_frame_TAT-TTS vocoder for speech synthesis, requiring 16000Hz mono channel audio input for optimal performance.

Direct speech-to-speech translation without intermediate text representation
Trained on high-quality TED talks and audiobook data
Integrates with HiFiGAN vocoder for natural speech synthesis

Core Capabilities

Direct English to Hokkien speech translation
High-quality voice synthesis through unit HiFiGAN integration
Support for real-time processing of 16kHz mono audio
Efficient single-pass decoding architecture

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to perform direct speech-to-speech translation between English and Hokkien without requiring intermediate text representation, making it more efficient and potentially more accurate for spoken language translation.

Q: What are the recommended use cases?

The model is ideal for applications requiring English to Hokkien translation in TED-talk style contexts, educational settings, and general speech translation scenarios where natural-sounding output is crucial.