Metis

Metis

amphion

Metis is a foundation speech generation model using masked pre-training, supporting TTS, voice conversion, speech enhancement & more with just 20M parameters.

PropertyValue
Authoramphion
Model URLhttps://huggingface.co/amphion/Metis
PaperarXiv:2502.03128

What is Metis?

Metis is a groundbreaking foundation model for unified speech generation that employs masked generative pre-training on large-scale unlabeled speech data. It represents a significant advancement in speech synthesis technology, capable of handling multiple speech generation tasks through a single pre-trained model with fewer than 20M trainable parameters.

Implementation Details

The model architecture consists of three key components: a Semantic Codec for converting speech to semantic tokens, an Acoustic Codec for handling acoustic tokens and waveform reconstruction, and a Semantic2Acoustic component for predicting acoustic tokens based on semantic inputs. Metis utilizes two discrete speech representations: SSL tokens and acoustic tokens, pre-trained on 300K hours of diverse speech data.

  • Masked generative pre-training approach
  • Efficient fine-tuning capability for task adaptation
  • Support for multiple speech generation tasks
  • Compact model size with high performance

Core Capabilities

  • Zero-shot text-to-speech synthesis
  • Voice conversion
  • Target speaker extraction
  • Speech enhancement
  • Lip-to-speech generation

Frequently Asked Questions

Q: What makes this model unique?

Metis stands out for its ability to handle multiple speech generation tasks through a single pre-trained model, achieving state-of-the-art results with significantly fewer parameters and less training data than task-specific systems.

Q: What are the recommended use cases?

The model is ideal for applications requiring speech generation, enhancement, or conversion. It can be used for text-to-speech systems, voice conversion applications, speech enhancement in noisy environments, and even generating speech from lip movements.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026