TinyOctopus

Maintained By
SaraAlthubaiti

TinyOctopus

PropertyValue
AuthorSaraAlthubaiti
Model TypeBilingual Audio Language Model
ArchitectureDistil-Whisper + DeepSeek 1.5B
Model URLHugging Face

What is TinyOctopus?

TinyOctopus is an innovative bilingual Audio Language Model designed for processing and generating text from audio inputs in both Arabic and English. The model combines Distil-Whisper for audio encoding with DeepSeek 1.5B for text generation, connected through a cross-attention projection layer.

Implementation Details

The model architecture consists of three main components: Distil-Whisper (distil-large-v3) for audio encoding, a trainable cross-attention projection layer for feature alignment, and DeepSeek 1.5B as the core language model. It has been trained on substantial datasets including QASR (2,000 hours of Arabic speech) and ADI17 (3,000 hours of dialect-specific content).

  • Arabic ASR Performance: 16.00% WER
  • English ASR Performance: 4.50% WER
  • Translation BLEU Score: 55.05 (GPT-4o)
  • Dialect Identification Accuracy: 70.59%

Core Capabilities

  • Bilingual Automatic Speech Recognition (ASR)
  • Arabic to English Speech Translation
  • Arabic Dialect Identification
  • Multi-dialect Speech Processing

Frequently Asked Questions

Q: What makes this model unique?

TinyOctopus stands out for its bilingual capabilities and specialized Arabic dialect processing, achieving competitive performance in both ASR and translation tasks while maintaining efficiency through its distilled architecture.

Q: What are the recommended use cases?

The model is ideal for automatic transcription of Arabic and English speech, Arabic-to-English translation, and Arabic dialect identification in various contexts such as broadcast media, academic research, and general speech processing applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.