Maya
Property | Value |
---|---|
Parameters | 8 billion |
License | Apache 2.0 |
Languages | English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi |
Paper | Maya: An Instruction Finetuned Multilingual Multimodal Model |
What is Maya?
Maya is a groundbreaking multilingual vision-language model developed by the Cohere For AI Community. Built on the LLaVA framework using the Aya-23 8B model, it represents a significant advancement in multilingual multimodal AI capabilities, supporting eight different languages while maintaining strong cultural awareness and sensitivity.
Implementation Details
The model leverages SigLIP for vision encoding with multilingual adaptability and was trained on a carefully curated dataset of 558,000 images with multilingual annotations. The training infrastructure utilized 8xH100 GPUs with 80GB DRAM, implementing a batch size of 32 per device and a learning rate of 1e-3 with cosine scheduler.
- Context length of 8K tokens
- Toxicity-filtered dataset for safer deployment
- Built-in cultural sensitivity evaluations
- Multilingual vision encoder adaptation
Core Capabilities
- Multilingual visual question answering
- Cross-cultural image understanding
- Image captioning in multiple languages
- Visual reasoning tasks
- Document understanding and analysis
Frequently Asked Questions
Q: What makes this model unique?
Maya stands out for its comprehensive multilingual support across 8 languages and its emphasis on cultural sensitivity. The model's architecture combines SigLIP vision encoding with the LLaVA framework, creating a powerful yet efficient system for cross-cultural visual understanding.
Q: What are the recommended use cases?
The model excels in multilingual visual question answering, image captioning, and document understanding tasks. It's particularly valuable for applications requiring cross-cultural image understanding and visual reasoning across different languages. However, it's not recommended for critical decision-making applications.