Llasa-1B-multi-speakers-genshin-zh-en-ja-ko

HKUSTAudio

Multi-speaker text-to-speech model based on LLaMA architecture supporting Chinese, English, Japanese & Korean voices from Genshin Impact, developed by HKUSTAudio.

Property	Value
Author	HKUSTAudio
Model Size	1B parameters
Model URL	Hugging Face

What is Llasa-1B-multi-speakers-genshin-zh-en-ja-ko?

Llasa-1B is an advanced text-to-speech synthesis model based on the LLaMA architecture, specifically designed to handle multiple languages including Chinese, English, Japanese, and Korean. The model specializes in generating character voices from Genshin Impact, demonstrating the capability to scale both training and inference compute efficiently.

Implementation Details

The model implements the LLaSA (LLaMA-based Speech Synthesis) framework, which focuses on optimizing computational resources during both training and inference phases. Updated in 2025, it includes specific finetune instructions for customization.

Built on LLaMA architecture for efficient scaling
Supports four major Asian languages
Optimized for character voice synthesis
Includes custom finetune capabilities

Core Capabilities

Multi-language text-to-speech synthesis
Character voice recreation from Genshin Impact
Efficient compute scaling during training and inference
Support for fine-tuning and customization

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines LLaMA's architecture with multi-language speech synthesis capabilities, specifically optimized for character voices. Its ability to handle four different languages while maintaining voice quality makes it particularly valuable for game and entertainment applications.

Q: What are the recommended use cases?

The model is ideal for game development, content creation, and applications requiring multilingual voice synthesis, particularly those needing authentic-sounding character voices. It's especially suited for projects requiring Asian language support and game-style voice generation.