TinyOctopus

Property	Value
Author	SaraAlthubaiti
Model Type	Bilingual Audio Language Model
Architecture	Distil-Whisper + DeepSeek 1.5B
Model URL	Hugging Face

What is TinyOctopus?

TinyOctopus is an innovative bilingual Audio Language Model designed for processing and generating text from audio inputs in both Arabic and English. The model combines Distil-Whisper for audio encoding with DeepSeek 1.5B for text generation, connected through a cross-attention projection layer.

Implementation Details

The model architecture consists of three main components: Distil-Whisper (distil-large-v3) for audio encoding, a trainable cross-attention projection layer for feature alignment, and DeepSeek 1.5B as the core language model. It has been trained on substantial datasets including QASR (2,000 hours of Arabic speech) and ADI17 (3,000 hours of dialect-specific content).

Arabic ASR Performance: 16.00% WER
English ASR Performance: 4.50% WER
Translation BLEU Score: 55.05 (GPT-4o)
Dialect Identification Accuracy: 70.59%

Core Capabilities

Bilingual Automatic Speech Recognition (ASR)
Arabic to English Speech Translation
Arabic Dialect Identification
Multi-dialect Speech Processing

Frequently Asked Questions

Q: What makes this model unique?

TinyOctopus stands out for its bilingual capabilities and specialized Arabic dialect processing, achieving competitive performance in both ASR and translation tasks while maintaining efficiency through its distilled architecture.

Q: What are the recommended use cases?

The model is ideal for automatic transcription of Arabic and English speech, Arabic-to-English translation, and Arabic dialect identification in various contexts such as broadcast media, academic research, and general speech processing applications.

TinyOctopus

TinyOctopus

What is TinyOctopus?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models