Metis

Maintained By
amphion

Metis

PropertyValue
Authoramphion
Model URLhttps://huggingface.co/amphion/Metis
PaperarXiv:2502.03128

What is Metis?

Metis is a groundbreaking foundation model for unified speech generation that employs masked generative pre-training on large-scale unlabeled speech data. It represents a significant advancement in speech synthesis technology, capable of handling multiple speech generation tasks through a single pre-trained model with fewer than 20M trainable parameters.

Implementation Details

The model architecture consists of three key components: a Semantic Codec for converting speech to semantic tokens, an Acoustic Codec for handling acoustic tokens and waveform reconstruction, and a Semantic2Acoustic component for predicting acoustic tokens based on semantic inputs. Metis utilizes two discrete speech representations: SSL tokens and acoustic tokens, pre-trained on 300K hours of diverse speech data.

  • Masked generative pre-training approach
  • Efficient fine-tuning capability for task adaptation
  • Support for multiple speech generation tasks
  • Compact model size with high performance

Core Capabilities

  • Zero-shot text-to-speech synthesis
  • Voice conversion
  • Target speaker extraction
  • Speech enhancement
  • Lip-to-speech generation

Frequently Asked Questions

Q: What makes this model unique?

Metis stands out for its ability to handle multiple speech generation tasks through a single pre-trained model, achieving state-of-the-art results with significantly fewer parameters and less training data than task-specific systems.

Q: What are the recommended use cases?

The model is ideal for applications requiring speech generation, enhancement, or conversion. It can be used for text-to-speech systems, voice conversion applications, speech enhancement in noisy environments, and even generating speech from lip movements.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.