dolphin-small

DataoceanAI

Multilingual ASR model supporting 40+ Eastern languages & Chinese dialects. 372M params, 25.2% WER. Features speech recognition, VAD, segmentation & LID.

Property	Value
Parameter Count	372M
Model Type	ASR (Automatic Speech Recognition)
Architecture	Joint CTC-Attention with E-Branchformer encoder
License	Apache 2.0
Average WER	25.2%

What is dolphin-small?

Dolphin-small is a powerful multilingual ASR model developed through collaboration between DataoceanAI and Tsinghua University. It's designed specifically for Eastern languages, supporting 40 languages across East Asia, South Asia, Southeast Asia, and the Middle East, plus 22 Chinese dialects. Trained on over 210,000 hours of data, it represents a significant advancement in multilingual speech recognition technology.

Implementation Details

The model employs a sophisticated joint CTC-Attention architecture, utilizing an E-Branchformer-based encoder and a standard Transformer decoder. A notable innovation is its two-level language token system, which handles linguistic and regional diversity through separate language and region tokens (e.g., for language, for region).

372M parameters for optimal performance-efficiency balance
Trained on both proprietary and open-source datasets
FFmpeg requirement for audio conversion to WAV format
Streamlined architecture without translation capabilities

Core Capabilities

Speech Recognition across 40+ languages
Voice Activity Detection (VAD)
Audio Segmentation
Language Identification (LID)
Regional dialect support for 22 Chinese variants

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized focus on Eastern languages and dialects, combined with its innovative two-level language token system. This makes it particularly effective for Asian language processing, with state-of-the-art performance for these specific language families.

Q: What are the recommended use cases?

The model is ideal for applications requiring multilingual ASR capabilities in Eastern languages, such as transcription services, voice assistants, and automated content processing systems. It's particularly valuable for applications dealing with Chinese dialects and various Asian languages.