fish-speech-1.2

fish-speech-1.2

fishaudio

A multilingual text-to-speech model supporting English, Chinese, and Japanese, trained on 300k hours of audio data with non-commercial license.

PropertyValue
LicenseCC-BY-NC-SA-4.0
LanguagesEnglish, Chinese, Japanese
Training Data300k hours
GithubFish Speech Github

What is fish-speech-1.2?

Fish Speech V1.2 is an advanced multilingual text-to-speech (TTS) model developed by fishaudio. It represents a significant advancement in multilingual speech synthesis, trained on an extensive dataset of 300,000 hours across English, Chinese, and Japanese languages. The model utilizes Transformer architecture and implements dual_ar technology for high-quality speech generation.

Implementation Details

The model employs state-of-the-art Transformer architecture and is specifically designed for multilingual text-to-speech synthesis. It's implemented with careful consideration for maintaining natural speech patterns across different languages while ensuring high-quality audio output.

  • Transformer-based architecture for efficient sequence processing
  • Dual autoregressive (dual_ar) implementation for improved speech quality
  • Comprehensive training on 300k hours of multilingual data
  • Non-commercial licensing under CC-BY-NC-SA-4.0

Core Capabilities

  • Multilingual speech synthesis in English, Chinese, and Japanese
  • High-quality voice generation with natural intonation
  • Cross-lingual voice synthesis capabilities
  • Efficient processing and generation of speech content

Frequently Asked Questions

Q: What makes this model unique?

Fish Speech V1.2 stands out due to its extensive training on 300,000 hours of multilingual data and its ability to handle three major languages (English, Chinese, and Japanese) with high quality output. The implementation of dual_ar technology and Transformer architecture ensures superior speech synthesis quality.

Q: What are the recommended use cases?

The model is ideal for non-commercial applications requiring high-quality multilingual text-to-speech conversion, such as educational content, personal projects, and research applications. Due to its CC-BY-NC-SA-4.0 license, it cannot be used for commercial purposes.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026