Yi-VL-34B

Maintained By
01-ai

Yi-VL-34B

PropertyValue
LicenseApache 2.0
ArchitectureLLaVA-based with CLIP ViT-H/14
Research PaperYi: Open Foundation Models
Base LLMYi-34B-Chat

What is Yi-VL-34B?

Yi-VL-34B is the world's first open-source 34B vision language model, designed for advanced image understanding and bilingual conversation. Built by 01-ai, it combines a powerful Vision Transformer with the Yi-34B language model to enable sophisticated image-text interactions.

Implementation Details

The model leverages a three-component architecture: a CLIP ViT-H/14 for image encoding, a projection module for feature alignment, and the Yi-34B-Chat LLM. It supports high-resolution image processing (448×448) and underwent a comprehensive three-stage training process using over 100 million image-text pairs.

  • Multi-stage training on diverse datasets including LAION-400M, CLLaVA, and specialized visual datasets
  • Trained using 128 NVIDIA A800 GPUs over approximately 10 days
  • Implements advanced bilingual capabilities for both English and Chinese

Core Capabilities

  • Multi-round text-image conversations with single image input
  • High-resolution image understanding at 448×448
  • Top performance in MMMU and CMMMU benchmarks
  • Strong bilingual support for English and Chinese
  • Advanced visual information extraction and summarization

Frequently Asked Questions

Q: What makes this model unique?

Yi-VL-34B is the first open-source 34B vision language model worldwide, offering superior bilingual capabilities and achieving top performance in major benchmarks. Its high-resolution processing and comprehensive training make it particularly effective for detailed image analysis.

Q: What are the recommended use cases?

The model excels in visual question answering, image content analysis, bilingual image-based conversations, and detailed visual information extraction. It's particularly suitable for applications requiring sophisticated image understanding in both English and Chinese contexts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.