nanoLLaVA

nanoLLaVA

qnguyen3

Efficient 1B param vision-language model optimized for edge devices. Combines Quyen-SE-v0.1 LLM with SigLIP vision encoder. Strong performance on VQA tasks.

PropertyValue
Parameter Count1.05B
Model TypeVision-Language Model
LicenseApache-2.0
Tensor TypeBF16

What is nanoLLaVA?

nanoLLaVA is a compact yet powerful vision-language model designed specifically for edge device deployment. Built on the foundation of Quyen-SE-v0.1 (Qwen1.5-0.5B) as its base LLM and utilizing google/siglip-so400m-patch14-384 as its vision encoder, this model achieves impressive performance despite its relatively small size of 1.05B parameters.

Implementation Details

The model follows the ChatML standard for prompt formatting and can be easily implemented using the transformers library. It supports both CPU and CUDA implementations, with optimized inference through PyTorch.

  • Base LLM: Quyen-SE-v0.1 (Qwen1.5-0.5B)
  • Vision Encoder: google/siglip-so400m-patch14-384
  • Tensor Format: BF16
  • Comprehensive multimodal understanding capabilities

Core Capabilities

  • VQA v2 Score: 70.84
  • TextVQA Performance: 46.71
  • ScienceQA Accuracy: 58.97
  • POPE Score: 84.1
  • MMMU Test Performance: 28.6
  • GQA Score: 54.79

Frequently Asked Questions

Q: What makes this model unique?

nanoLLaVA stands out for its efficient design that enables deployment on edge devices while maintaining strong performance across various vision-language tasks. Its compact size of 1.05B parameters makes it particularly suitable for resource-constrained environments.

Q: What are the recommended use cases?

The model is ideal for applications requiring visual question answering, image description, and general vision-language understanding tasks on edge devices. It's particularly effective for scenarios where computational resources are limited but reliable multimodal understanding is necessary.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026