OuteTTS-0.2-500M

OuteTTS-0.2-500M

OuteAI

OuteTTS-0.2-500M: A 500M parameter multilingual text-to-speech model supporting English, Chinese, Japanese, and Korean, built on Qwen-2.5-0.5B with enhanced voice cloning capabilities.

PropertyValue
Parameter Count500M
Base ModelQwen-2.5-0.5B
LicenseCC BY NC 4.0
Supported LanguagesEnglish, Chinese, Japanese, Korean
Tensor TypeBF16

What is OuteTTS-0.2-500M?

OuteTTS-0.2-500M is an advanced multilingual text-to-speech model that represents a significant improvement over its predecessor. Built on the Qwen-2.5-0.5B architecture, this model leverages audio prompts without requiring architectural modifications to the foundation model. It has been trained on over 5 billion audio prompt tokens across multiple high-quality datasets.

Implementation Details

The model implements a sophisticated approach to speech synthesis, utilizing features like WavTokenizer and CTC Forced Alignment. It supports both Hugging Face and GGUF implementations, with options for bfloat16 precision and flash attention for optimal performance.

  • Built on Qwen-2.5-0.5B architecture
  • Trained on multiple datasets including Emilia-Dataset, LibriTTS-R, and Multilingual LibriSpeech
  • Supports context length of 4096 tokens (~54 seconds of audio)
  • Implements advanced voice cloning capabilities

Core Capabilities

  • High-quality multilingual speech synthesis
  • Voice cloning with speaker profile support
  • Temperature-controlled speech generation
  • Support for four languages with primary focus on English
  • Improved prompt following and output coherence

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to perform high-quality voice cloning without architectural changes to the foundation model, combined with its multilingual capabilities and efficient implementation, makes it stand out in the TTS space.

Q: What are the recommended use cases?

This model is ideal for applications requiring natural speech synthesis, voice cloning, and multilingual support. It's particularly well-suited for content creation, accessibility tools, and educational applications, though commercial use requires appropriate licensing.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026