xm_transformer_unity_en-hk
Property | Value |
---|---|
License | cc-by-nc-4.0 |
Framework | Fairseq |
Task Type | Speech-to-Speech Translation |
Dataset | MuST-C |
What is xm_transformer_unity_en-hk?
The xm_transformer_unity_en-hk is a sophisticated speech-to-speech translation model developed by Facebook that directly converts English speech into Hokkien speech. It utilizes a two-pass decoder system called UnitY and is specifically trained on both supervised TED domain data and weakly supervised data from TED and Audiobook domains.
Implementation Details
This model implements a complex pipeline that combines speech recognition and synthesis. It uses the facebook/unit_hifigan_HK_layer12.km2500_frame_TAT-TTS for speech synthesis and requires 16000Hz mono channel audio input. The implementation leverages the Fairseq framework and includes comprehensive audio processing capabilities.
- Two-pass decoder architecture with UnitY system
- Integrated speech synthesis using HiFiGAN vocoder
- Support for both TED and Audiobook domain translations
- Direct speech-to-speech conversion without intermediate text representation
Core Capabilities
- Direct English to Hokkien speech translation
- High-quality speech synthesis using specialized vocoder
- Processing of 16kHz mono channel audio
- Support for both supervised and weakly supervised training data
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its ability to perform direct speech-to-speech translation between English and Hokkien, a language pair that traditionally has limited resources. The two-pass decoder system and integration with specialized vocoders make it particularly effective for real-world applications.
Q: What are the recommended use cases?
The model is ideal for applications requiring English to Hokkien translation in TED-talk style content and audiobook contexts. It's particularly suitable for scenarios where direct speech output is needed without intermediate text representation.