llama-labahasa-11B

Maintained By
LABahasa

LABahasa 11B

PropertyValue
Parameter Count11.4B
Model TypeMultimodal LLM
Base ArchitectureLlama-3.2-11B-Vision-Instruct + Whisper-large
Training Infrastructure8xH100 GPUs
Training Time25 hours

What is llama-labahasa-11B?

LABahasa 11B is a sophisticated multimodal language model developed by Meeting.AI and Lintasarta, designed to process text, audio, and image inputs simultaneously. Built on Meta's Llama 3.2 and OpenAI's Whisper architectures, it has been specifically optimized for Indonesian language processing while maintaining strong English language capabilities. The model was trained on a massive 9 billion high-quality bilingual dataset.

Implementation Details

The model employs a feed-forward network to project audio embeddings from the Whisper Large encoder to Llama's input embeddings. This architecture allows seamless integration of multiple input modalities, including text, audio, and images. Training was conducted using BF16 mixed precision to optimize performance and efficiency.

  • Specialized audio processing using placeholder token <|audio|>
  • Integration with Llama 3.2's vision features
  • Enhanced Indonesian language understanding and generation
  • Multimodal input processing capabilities

Core Capabilities

  • Superior performance on MMLU (67.2) compared to Qwen2.5-14B
  • Exceptional Indonesian language understanding (72.2 on id-MMLU)
  • Multimodal processing of text, audio, and image inputs
  • Strong mathematical reasoning capabilities (64.5 on Multi-Mathematics)

Frequently Asked Questions

Q: What makes this model unique?

LABahasa 11B stands out for its specialized optimization for Indonesian language processing while maintaining strong English capabilities, combined with true multimodal abilities across text, audio, and image inputs. Its architecture uniquely combines Llama and Whisper models for comprehensive language understanding.

Q: What are the recommended use cases?

The model is ideal for applications requiring multilingual understanding (particularly Indonesian-English), multimodal processing, and complex NLP tasks. It's particularly well-suited for applications involving audio transcription, image understanding, and cross-lingual communication.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.