maya

maya

maya-multimodal

Maya is an 8B parameter multilingual vision-language model supporting 8 languages, built on LLaVA framework with SigLIP vision encoding and cultural sensitivity focus.

PropertyValue
Parameters8 billion
LicenseApache 2.0
LanguagesEnglish, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi
PaperMaya: An Instruction Finetuned Multilingual Multimodal Model

What is Maya?

Maya is a groundbreaking multilingual vision-language model developed by the Cohere For AI Community. Built on the LLaVA framework using the Aya-23 8B model, it represents a significant advancement in multilingual multimodal AI capabilities, supporting eight different languages while maintaining strong cultural awareness and sensitivity.

Implementation Details

The model leverages SigLIP for vision encoding with multilingual adaptability and was trained on a carefully curated dataset of 558,000 images with multilingual annotations. The training infrastructure utilized 8xH100 GPUs with 80GB DRAM, implementing a batch size of 32 per device and a learning rate of 1e-3 with cosine scheduler.

  • Context length of 8K tokens
  • Toxicity-filtered dataset for safer deployment
  • Built-in cultural sensitivity evaluations
  • Multilingual vision encoder adaptation

Core Capabilities

  • Multilingual visual question answering
  • Cross-cultural image understanding
  • Image captioning in multiple languages
  • Visual reasoning tasks
  • Document understanding and analysis

Frequently Asked Questions

Q: What makes this model unique?

Maya stands out for its comprehensive multilingual support across 8 languages and its emphasis on cultural sensitivity. The model's architecture combines SigLIP vision encoding with the LLaVA framework, creating a powerful yet efficient system for cross-cultural visual understanding.

Q: What are the recommended use cases?

The model excels in multilingual visual question answering, image captioning, and document understanding tasks. It's particularly valuable for applications requiring cross-cultural image understanding and visual reasoning across different languages. However, it's not recommended for critical decision-making applications.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026