Llama-3.1-8B-Omni

Maintained By
ICTNLP

Llama-3.1-8B-Omni

PropertyValue
Parameter Count9.11B
Model TypeSpeech-Language Model
LicenseResearch Only (Non-commercial)
PaperarXiv:2409.06666
Base ModelLlama-3.1-8B-Instruct

What is Llama-3.1-8B-Omni?

Llama-3.1-8B-Omni represents a breakthrough in speech-language interaction, built upon the foundation of Llama-3.1-8B-Instruct. This innovative model enables seamless speech interaction with large language models, providing both text and speech responses simultaneously. What sets it apart is its remarkably low latency of just 226ms, making it highly responsive for real-time applications.

Implementation Details

The model leverages advanced architecture combining speech processing capabilities with the robust language understanding of Llama 3.1. Training was accomplished efficiently in under 3 days using just 4 GPUs, demonstrating impressive resource optimization. The implementation includes integration with Whisper-large-v3 for speech encoding and a unit-based HiFi-GAN vocoder for high-quality speech synthesis.

  • Built on Llama-3.1-8B-Instruct architecture
  • Integrates speech processing capabilities
  • Uses FP16 tensor type for efficient computation
  • Implements streamlined inference pipeline

Core Capabilities

  • Ultra-low latency speech interaction (226ms)
  • Simultaneous text and speech response generation
  • High-quality speech synthesis
  • Efficient resource utilization
  • Seamless integration with existing speech recognition systems

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its ability to provide near-real-time speech interaction while simultaneously generating both text and speech outputs, all while maintaining high quality in both modalities. The extremely low latency of 226ms sets it apart from traditional speech-language models.

Q: What are the recommended use cases?

The model is particularly suited for academic research in speech interaction systems, virtual assistants, and speech-based human-AI interaction studies. However, it's important to note that commercial use requires explicit permission from the authors.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.