SenseVoiceSmall

Maintained By
FunAudioLLM

SenseVoiceSmall

PropertyValue
LicenseModel License
LanguagesEnglish, Chinese, Japanese, Korean
FrameworkFunASR

What is SenseVoiceSmall?

SenseVoiceSmall is a compact yet powerful speech foundation model that combines multiple speech understanding capabilities in a single architecture. It's designed for efficient multilingual speech recognition, emotion detection, and audio event classification, offering exceptional performance with minimal computational overhead.

Implementation Details

The model implements a non-autoregressive end-to-end framework that processes audio 15 times faster than Whisper-Large, requiring only 70ms to process 10 seconds of audio. It's trained on over 400,000 hours of data across multiple languages and supports various audio processing tasks.

  • Multilingual ASR capability surpassing Whisper model performance
  • State-of-the-art emotion recognition capabilities
  • Efficient audio event detection for common scenarios
  • Dynamic batching support for optimized processing

Core Capabilities

  • High-accuracy multilingual speech recognition
  • Speech emotion recognition across multiple languages
  • Audio event detection (bgm, applause, laughter, etc.)
  • Voice Activity Detection (VAD) integration
  • Rich transcription with punctuation and inverse text normalization

Frequently Asked Questions

Q: What makes this model unique?

SenseVoiceSmall combines multiple speech understanding capabilities with exceptional inference speed, making it ideal for real-world applications requiring fast, accurate speech processing across multiple languages.

Q: What are the recommended use cases?

The model is perfect for applications requiring multilingual speech transcription, emotion analysis in speech, and audio event detection. It's particularly suitable for scenarios where low latency is crucial, such as real-time transcription services or automated content analysis.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.