fish-agent-v0.1-3b

Maintained By
fishaudio

Fish Agent V0.1 3B

PropertyValue
Model Size3B parameters
LicenseCC-BY-NC-SA-4.0
Languages Supported8 (English, Chinese, German, Japanese, French, Spanish, Korean, Arabic)
Training Data700,000 hours

What is fish-agent-v0.1-3b?

Fish Agent V0.1 3B is a revolutionary Voice-to-Voice model that represents a significant advancement in audio processing technology. Built on Qwen-2.5-3B-Instruct and further trained on 200B voice & text tokens, it uniquely processes environmental audio information without requiring traditional semantic encoders/decoders like Whisper and CosyVoice.

Implementation Details

The model employs a semantic-token-free architecture, making it more efficient and direct in audio processing. It has been extensively trained on multilingual content, with particularly robust coverage of English and Chinese (300,000 hours each) and significant training data for six other languages (20,000 hours each).

  • Continue-pretrained version of Qwen-2.5-3B-Instruct
  • Trained on 200B voice & text tokens
  • Supports both audio-to-audio and text-to-speech capabilities

Core Capabilities

  • Voice-to-Voice conversion with environmental audio preservation
  • High-quality text-to-speech generation
  • Multilingual support across 8 major languages
  • Direct audio processing without semantic token intermediaries

Frequently Asked Questions

Q: What makes this model unique?

Its semantic-token-free architecture and ability to handle environmental audio information sets it apart from traditional voice models, offering more direct and efficient audio processing.

Q: What are the recommended use cases?

The model is ideal for voice conversion, text-to-speech applications, and multilingual audio processing, particularly in non-commercial settings as per its license requirements.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.