Fish Agent V0.1 3B
Property | Value |
---|---|
Model Size | 3B parameters |
License | CC-BY-NC-SA-4.0 |
Languages Supported | 8 (English, Chinese, German, Japanese, French, Spanish, Korean, Arabic) |
Training Data | 700,000 hours |
What is fish-agent-v0.1-3b?
Fish Agent V0.1 3B is a revolutionary Voice-to-Voice model that represents a significant advancement in audio processing technology. Built on Qwen-2.5-3B-Instruct and further trained on 200B voice & text tokens, it uniquely processes environmental audio information without requiring traditional semantic encoders/decoders like Whisper and CosyVoice.
Implementation Details
The model employs a semantic-token-free architecture, making it more efficient and direct in audio processing. It has been extensively trained on multilingual content, with particularly robust coverage of English and Chinese (300,000 hours each) and significant training data for six other languages (20,000 hours each).
- Continue-pretrained version of Qwen-2.5-3B-Instruct
- Trained on 200B voice & text tokens
- Supports both audio-to-audio and text-to-speech capabilities
Core Capabilities
- Voice-to-Voice conversion with environmental audio preservation
- High-quality text-to-speech generation
- Multilingual support across 8 major languages
- Direct audio processing without semantic token intermediaries
Frequently Asked Questions
Q: What makes this model unique?
Its semantic-token-free architecture and ability to handle environmental audio information sets it apart from traditional voice models, offering more direct and efficient audio processing.
Q: What are the recommended use cases?
The model is ideal for voice conversion, text-to-speech applications, and multilingual audio processing, particularly in non-commercial settings as per its license requirements.