Vokan

ShoukanLabs

StyleTTS2 fine-tuned TTS model trained on 672 speakers across AniSpeech, VCTK, and LibriTTS-R datasets. Optimized for expressive zero-shot speech synthesis with diverse accents

Property	Value
Author	ShoukanLabs
License	MIT
Base Architecture	StyleTTS2
Training Resources	300h on H100, 600h on 3090

What is Vokan?

Vokan is an advanced fine-tuned implementation of StyleTTS2, specifically engineered for expressive zero-shot text-to-speech synthesis. The model represents a significant achievement in speech synthesis, trained on a diverse dataset comprising 672 speakers from AniSpeech, VCTK, and LibriTTS-R, totaling over 6 days worth of audio data.

Implementation Details

The model's architecture builds upon StyleTTS2's foundation, incorporating extensive training across multiple high-performance computing platforms. The training process involved 300 hours on an NVIDIA H100 GPU and an additional 600 hours on an NVIDIA 3090, demonstrating the substantial computational requirements for achieving high-quality speech synthesis.

Trained on multiple datasets for enhanced diversity
Optimized for zero-shot performance
Incorporates various accents and speaking styles
Designed as a robust base model for further fine-tuning

Core Capabilities

High-quality zero-shot speech synthesis
Diverse accent and speaker characteristic reproduction
Expressive and natural-sounding output
Versatile base for custom fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

Vokan's uniqueness lies in its extensive training across diverse speaker datasets and its optimization for expressiveness in zero-shot scenarios. Despite using less training data than the original StyleTTS2, it achieves remarkable performance through carefully curated diverse data.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring expressive text-to-speech synthesis, especially where authenticity and natural-sounding output are crucial. It serves excellently as a base model for further fine-tuning projects.