llava-llama-3-8b-v1_1

llava-llama-3-8b-v1_1

xtuner

LLaVA model fine-tuned from Meta-Llama-3-8B-Instruct with CLIP integration, optimized for image-text tasks. 8.03B params, strong MMBench performance.

PropertyValue
Parameter Count8.03B
Model TypeImage-Text-to-Text
ArchitectureLLaVA with CLIP-ViT-Large
Tensor TypeFP16

What is llava-llama-3-8b-v1_1?

llava-llama-3-8b-v1_1 is an advanced multimodal model that combines Meta's Llama-3-8B-Instruct architecture with CLIP-ViT-Large visual encoding capabilities. It's specifically designed to handle complex image-text interactions, fine-tuned on ShareGPT4V-PT and InternVL-SFT datasets for enhanced performance.

Implementation Details

The model utilizes a sophisticated architecture combining a CLIP-L visual encoder with an MLP projector, operating at a resolution of 336. It employs a strategic training approach with frozen LLM and ViT during pretraining, followed by full LLM training with LoRA ViT during fine-tuning.

  • Visual Encoder: CLIP-ViT-Large-patch14-336
  • Base Model: meta-llama/Meta-Llama-3-8B-Instruct
  • Training Strategy: Full LLM with LoRA ViT fine-tuning
  • Dataset Size: 1246K pretraining + 1268K fine-tuning samples

Core Capabilities

  • 72.3% accuracy on MMBench Test (EN)
  • 66.4% accuracy on MMBench Test (CN)
  • 70.0% accuracy on AI2D Test
  • Robust performance across multiple vision-language tasks
  • Enhanced multilingual capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its improved performance metrics compared to previous versions, particularly in MMBench and AI2D tests. It leverages a unique combination of ShareGPT4V-PT and InternVL-SFT datasets, resulting in better cross-modal understanding.

Q: What are the recommended use cases?

The model excels in vision-language tasks including visual question answering, image understanding, and multilingual image-text interactions. It's particularly suitable for applications requiring detailed image analysis and natural language responses.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026