xeus

Maintained By
espnet

XEUS: Cross-lingual Encoder for Universal Speech

PropertyValue
Parameters577M
LicenseCC-BY-NC-SA-4.0
ArchitectureE-Branchformer
PaperLink to Paper

What is XEUS?

XEUS is a groundbreaking multilingual speech encoder developed by Carnegie Mellon University's WAVLab that supports over 4000 languages. It represents a significant advancement in universal speech processing, trained on more than 1 million hours of publicly available speech data. The model employs the E-Branchformer architecture and utilizes HuBERT-style masked prediction for training.

Implementation Details

The model incorporates several innovative technical features:

  • Trained using masked prediction of discrete speech tokens from WavLabLM
  • Implements acoustic noise and reverberation augmentation for enhanced robustness
  • Supports Flash Attention for improved performance
  • Offers customizable masking settings for fine-tuning

Core Capabilities

  • State-of-the-art performance on ML-SUPERB multilingual speech recognition
  • Exceeds performance of models like MMS, w2v-BERT 2.0, and XLS-R
  • Sets new benchmarks on 4 tasks in the monolingual SUPERB benchmark
  • Provides robust speech representations across thousands of languages

Frequently Asked Questions

Q: What makes this model unique?

XEUS stands out for its unprecedented language coverage (4000+ languages) and its robust performance across multiple speech processing tasks. The model's architecture and training approach, combining E-Branchformer with acoustic augmentation, make it particularly effective for universal speech processing.

Q: What are the recommended use cases?

The model is primarily designed for speech recognition and translation tasks, requiring fine-tuning for specific applications. It can also be used for semantic speech tokenization through k-means clustering of its hidden states. The model is particularly valuable for multilingual applications and research in low-resource languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.