Llasa-1B

Maintained By
HKUSTAudio

Llasa-1B

PropertyValue
DeveloperHKUSTAudio
LicenseCC BY-NC 4.0
Training Data250,000 hours Chinese-English speech
Codebook Size65,536 tokens

What is Llasa-1B?

Llasa-1B is an innovative text-to-speech synthesis model that builds upon the LLaMA language model architecture. It integrates speech capabilities by incorporating XCodec2 codebook tokens, enabling high-quality speech generation in both Chinese and English. The model represents a significant advancement in multilingual speech synthesis technology.

Implementation Details

The model architecture extends the base LLaMA-1B model by incorporating speech tokens from the XCodec2 codebook. It can generate speech either directly from text input or by utilizing speech prompts, making it versatile for various applications. The implementation supports both direct text-to-speech conversion and voice cloning capabilities.

  • Built on LLaMA architecture with speech token integration
  • Uses XCodec2 codebook with 65,536 unique speech tokens
  • Supports 16kHz audio output
  • Implements both zero-shot and prompt-based speech synthesis

Core Capabilities

  • Direct text-to-speech synthesis in Chinese and English
  • Voice cloning through speech prompts
  • High-quality speech generation with controllable parameters
  • Flexible deployment with adjustable inference settings

Frequently Asked Questions

Q: What makes this model unique?

Llasa-1B uniquely combines LLaMA's language understanding capabilities with XCodec2's speech tokenization, enabling high-quality multilingual speech synthesis with optional voice cloning features. The model's ability to handle both Chinese and English makes it particularly versatile.

Q: What are the recommended use cases?

The model is ideal for applications requiring high-quality text-to-speech conversion in Chinese or English, particularly when voice consistency or cloning is needed. However, commercial use is prohibited under the CC BY-NC 4.0 license.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.