YarnGPT2
Property | Value |
---|---|
Developer | saheedniyi |
Model Type | Text-to-Speech (TTS) |
Base Model | HuggingFaceTB/SmolLM2-360M |
Training Infrastructure | 1x A100 GPU |
Repository | Hugging Face |
What is YarnGPT2?
YarnGPT2 is an innovative text-to-speech model specifically designed to synthesize Nigerian-accented languages. The model utilizes pure language modeling techniques without relying on external adapters or complex architectures, making it a streamlined solution for generating natural, culturally authentic speech across multiple Nigerian languages including English, Yoruba, Igbo, and Hausa.
Implementation Details
The model was trained using PyTorch on publicly available Nigerian movies, podcasts, and open-source Nigerian-related audio data. It employs a WavTokenizer for audio processing, with audio files resampled to 24KHz. The training process involved 5 epochs with a batch size of 4, using AdamW optimizer and a linear schedule with warmup.
- Sampling rate: 24KHz
- Multiple voice options for each supported language
- Temperature and repetition penalty controls for output customization
- Integrated with standard transformers pipeline
Core Capabilities
- Nigerian-accented English synthesis
- Native language support for Yoruba, Igbo, and Hausa
- Multiple voice options per language
- High-quality, natural-sounding speech output
- Culturally relevant pronunciation and intonation
Frequently Asked Questions
Q: What makes this model unique?
YarnGPT2 stands out for its specialized focus on Nigerian languages and accents, offering a comprehensive solution for generating authentic Nigerian speech without complex architectural requirements.
Q: What are the recommended use cases?
The model is ideal for generating Nigerian-accented English speech for experimental purposes, content localization, and educational applications. However, it's not suitable for generating speech in languages outside its trained scope or other accents.