llava-rad

Maintained By
microsoft

LLaVA-Rad

PropertyValue
Parameter Count7 billion
Model TypeSmall Multimodal Transformer
ArchitectureLLaVA v1.5 with BiomedCLIP-CXR encoder
PaperarXiv:2412.10337
LicenseResearch Use Only

What is LLaVA-Rad?

LLaVA-Rad is a specialized medical AI model designed for chest X-ray analysis. Developed by Microsoft Research, it represents a significant advancement in making medical image analysis more accessible and efficient. The model combines a custom BiomedCLIP-CXR image encoder with the powerful Vicuna-7B language model to generate detailed radiological findings from chest X-rays.

Implementation Details

The model architecture builds upon the LLaVA framework, incorporating a specialized chest X-ray image encoder (BiomedCLIP-CXR) and a transformer-based language decoder. It was trained on over 697,000 image-text pairs from various international sources, requiring only one day of training on an 8-A100 GPU cluster.

  • Custom BiomedCLIP-CXR image encoder specialized for radiological images
  • Integration with Vicuna-7B-v1.5 language model
  • Efficient training approach using modular architecture
  • Fast inference capability on a single V100 GPU

Core Capabilities

  • Generation of detailed radiological findings from chest X-rays
  • State-of-the-art performance in report generation
  • Cross-modal retrieval capabilities
  • Competitive performance against larger models like GPT-4V and Med-PaLM M

Frequently Asked Questions

Q: What makes this model unique?

LLaVA-Rad stands out for its efficient architecture that achieves state-of-the-art performance with a relatively small 7B parameter count, making it more accessible for research and deployment. Its specialized training on chest X-rays and integration of BiomedCLIP-CXR makes it particularly effective for radiological analysis.

Q: What are the recommended use cases?

The model is designed specifically for research purposes in chest X-ray analysis and report generation. It is important to note that it is NOT intended for clinical decision-making or direct patient care. The model should be used in research settings only and requires proper ethical considerations regarding patient data privacy.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.