dolphin-small

dolphin-small

DataoceanAI

Multilingual ASR model supporting 40+ Eastern languages & Chinese dialects. 372M params, 25.2% WER. Features speech recognition, VAD, segmentation & LID.

PropertyValue
Parameter Count372M
Model TypeASR (Automatic Speech Recognition)
ArchitectureJoint CTC-Attention with E-Branchformer encoder
LicenseApache 2.0
Average WER25.2%

What is dolphin-small?

Dolphin-small is a powerful multilingual ASR model developed through collaboration between DataoceanAI and Tsinghua University. It's designed specifically for Eastern languages, supporting 40 languages across East Asia, South Asia, Southeast Asia, and the Middle East, plus 22 Chinese dialects. Trained on over 210,000 hours of data, it represents a significant advancement in multilingual speech recognition technology.

Implementation Details

The model employs a sophisticated joint CTC-Attention architecture, utilizing an E-Branchformer-based encoder and a standard Transformer decoder. A notable innovation is its two-level language token system, which handles linguistic and regional diversity through separate language and region tokens (e.g., for language, for region).

  • 372M parameters for optimal performance-efficiency balance
  • Trained on both proprietary and open-source datasets
  • FFmpeg requirement for audio conversion to WAV format
  • Streamlined architecture without translation capabilities

Core Capabilities

  • Speech Recognition across 40+ languages
  • Voice Activity Detection (VAD)
  • Audio Segmentation
  • Language Identification (LID)
  • Regional dialect support for 22 Chinese variants

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized focus on Eastern languages and dialects, combined with its innovative two-level language token system. This makes it particularly effective for Asian language processing, with state-of-the-art performance for these specific language families.

Q: What are the recommended use cases?

The model is ideal for applications requiring multilingual ASR capabilities in Eastern languages, such as transcription services, voice assistants, and automated content processing systems. It's particularly valuable for applications dealing with Chinese dialects and various Asian languages.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026