mHuBERT-147
Property | Value |
---|---|
Parameter Count | 94.4M |
License | CC-BY-NC-SA-4.0 |
Paper | arXiv |
Languages Supported | 147 |
Training Data | 90K hours |
What is mHuBERT-147?
mHuBERT-147 is a compact and powerful multilingual speech model that represents the third iteration of the HuBERT architecture, specifically designed for processing speech across 147 different languages. Unlike traditional HuBERT models, it employs faiss IVF discrete speech units and implements a sophisticated two-level language and data source up-sampling during training.
Implementation Details
The model utilizes the HuBERT base architecture with 94.4M parameters and incorporates K=1000 clusters for speech unit discretization. It has been trained on an extensive dataset of 90K hours of open-license speech data, making it one of the most comprehensive multilingual speech models available.
- Uses Fairseq framework for training with multilingual batching
- Implements OPQ16_64,IVF1000_HNSW32,PQ16x4fsr Faiss index for continuous pre-training
- Supports both Fairseq and HuggingFace implementations
Core Capabilities
- Achieves SOTA performance on multiple ML-SUPERB benchmarks
- Excels in language identification tasks
- Supports low-resource languages
- Enables cross-lingual speech processing
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to handle 147 languages while maintaining a relatively compact size of 94.4M parameters makes it unique. It also introduces novel training approaches with two-level up-sampling and faiss IVF discrete speech units.
Q: What are the recommended use cases?
The model is particularly well-suited for multilingual speech processing tasks, language identification, and speech representation learning across diverse languages, including low-resource ones.