hubert-base-ls960

hubert-base-ls960

facebook

HuBERT base model for self-supervised speech representation learning, trained on LibriSpeech. Features 16kHz audio processing and BERT-like prediction architecture.

PropertyValue
DeveloperFacebook
LicenseApache 2.0
PaperView Research Paper
Downloads97,098

What is hubert-base-ls960?

HuBERT (Hidden-Unit BERT) is a groundbreaking self-supervised speech representation learning model developed by Facebook. This base model is specifically trained on 16kHz sampled speech audio from the LibriSpeech dataset. It introduces an innovative approach to handling continuous speech input through an offline clustering mechanism and masked prediction tasks.

Implementation Details

The model operates on a BERT-like architecture but is specifically designed for speech processing. It employs an offline clustering step to provide aligned target labels and applies prediction loss over masked regions. The model requires 16kHz sampled speech input and doesn't include a built-in tokenizer, as it was pretrained solely on audio data.

  • Utilizes unsupervised clustering for label generation
  • Implements masked prediction similar to BERT
  • Combines acoustic and language modeling capabilities
  • Supports fine-tuning for specific speech recognition tasks

Core Capabilities

  • Speech representation learning
  • Feature extraction from audio inputs
  • Support for speech recognition after fine-tuning
  • Processing of 16kHz audio samples

Frequently Asked Questions

Q: What makes this model unique?

HuBERT's uniqueness lies in its approach to handling continuous speech input through offline clustering and its ability to learn both acoustic and language patterns simultaneously. It matches or exceeds wav2vec 2.0 performance on various benchmarks.

Q: What are the recommended use cases?

The model is best suited for speech recognition tasks after fine-tuning with labeled data. It's particularly effective for applications requiring high-quality speech representation learning and can be adapted for various downstream speech processing tasks.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026