metavoice-1B-v0.1

metavoice-1B-v0.1

metavoiceio

A 1.2B parameter Text-to-Speech model trained on 100K hours of speech, featuring emotional speech synthesis, voice cloning, and zero-shot capabilities.

PropertyValue
Model Size1.2B parameters
LicenseApache 2.0
LanguageEnglish
Training Data100K hours of speech

What is metavoice-1B-v0.1?

MetaVoice-1B-v0.1 is a sophisticated text-to-speech model designed to generate natural and emotional speech. Built with 1.2 billion parameters and trained on 100,000 hours of speech data, it represents a significant advancement in voice synthesis technology, offering voice cloning capabilities with minimal training data requirements.

Implementation Details

The model employs a multi-stage architecture that includes a causal GPT model for predicting EnCodec tokens, a non-causal transformer for hierarchy prediction, and multi-band diffusion for waveform generation. It utilizes advanced techniques like Flash Decoding for KV-caching and supports efficient batching operations.

  • Custom BPE tokenizer with 512 tokens
  • Two-hierarchy prediction system with flattened interleaving
  • Condition-free sampling for enhanced cloning
  • DeepFilterNet for artifact cleanup

Core Capabilities

  • Emotional speech synthesis with natural rhythm and tone
  • Voice cloning with as little as 1 minute of training data
  • Zero-shot cloning for American & British voices (30s reference)
  • Long-form synthesis support
  • Batch processing of varying text lengths

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to achieve high-quality voice cloning with minimal training data (1 minute) and zero-shot capabilities for specific accents sets it apart. It also maintains emotional fidelity without hallucinations, making it particularly reliable for production use.

Q: What are the recommended use cases?

The model is ideal for applications requiring emotional text-to-speech conversion, voice cloning services, and long-form content generation. It's particularly suitable for projects needing quick voice adaptation with minimal training data.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026