mHuBERT-147

Property	Value
Parameter Count	94.4M
License	CC-BY-NC-SA-4.0
Paper	arXiv
Languages Supported	147
Training Data	90K hours

What is mHuBERT-147?

mHuBERT-147 is a compact and powerful multilingual speech model that represents the third iteration of the HuBERT architecture, specifically designed for processing speech across 147 different languages. Unlike traditional HuBERT models, it employs faiss IVF discrete speech units and implements a sophisticated two-level language and data source up-sampling during training.

Implementation Details

The model utilizes the HuBERT base architecture with 94.4M parameters and incorporates K=1000 clusters for speech unit discretization. It has been trained on an extensive dataset of 90K hours of open-license speech data, making it one of the most comprehensive multilingual speech models available.

Uses Fairseq framework for training with multilingual batching
Implements OPQ16_64,IVF1000_HNSW32,PQ16x4fsr Faiss index for continuous pre-training
Supports both Fairseq and HuggingFace implementations

Core Capabilities

Achieves SOTA performance on multiple ML-SUPERB benchmarks
Excels in language identification tasks
Supports low-resource languages
Enables cross-lingual speech processing

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle 147 languages while maintaining a relatively compact size of 94.4M parameters makes it unique. It also introduces novel training approaches with two-level up-sampling and faiss IVF discrete speech units.

Q: What are the recommended use cases?

The model is particularly well-suited for multilingual speech processing tasks, language identification, and speech representation learning across diverse languages, including low-resource ones.

mHuBERT-147

mHuBERT-147

What is mHuBERT-147?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models