TinyOctopus
Property | Value |
---|---|
Author | SaraAlthubaiti |
Model Type | Bilingual Audio Language Model |
Architecture | Distil-Whisper + DeepSeek 1.5B |
Model URL | Hugging Face |
What is TinyOctopus?
TinyOctopus is an innovative bilingual Audio Language Model designed for processing and generating text from audio inputs in both Arabic and English. The model combines Distil-Whisper for audio encoding with DeepSeek 1.5B for text generation, connected through a cross-attention projection layer.
Implementation Details
The model architecture consists of three main components: Distil-Whisper (distil-large-v3) for audio encoding, a trainable cross-attention projection layer for feature alignment, and DeepSeek 1.5B as the core language model. It has been trained on substantial datasets including QASR (2,000 hours of Arabic speech) and ADI17 (3,000 hours of dialect-specific content).
- Arabic ASR Performance: 16.00% WER
- English ASR Performance: 4.50% WER
- Translation BLEU Score: 55.05 (GPT-4o)
- Dialect Identification Accuracy: 70.59%
Core Capabilities
- Bilingual Automatic Speech Recognition (ASR)
- Arabic to English Speech Translation
- Arabic Dialect Identification
- Multi-dialect Speech Processing
Frequently Asked Questions
Q: What makes this model unique?
TinyOctopus stands out for its bilingual capabilities and specialized Arabic dialect processing, achieving competitive performance in both ASR and translation tasks while maintaining efficiency through its distilled architecture.
Q: What are the recommended use cases?
The model is ideal for automatic transcription of Arabic and English speech, Arabic-to-English translation, and Arabic dialect identification in various contexts such as broadcast media, academic research, and general speech processing applications.