rad-dino

Maintained By
microsoft

RAD-DINO

PropertyValue
Parameter Count86.6M
LicenseMSRLA
PaperRAD-DINO: Exploring Scalable Medical Image Encoders
Training Data882,775 chest X-rays

What is RAD-DINO?

RAD-DINO is a sophisticated vision transformer model developed by Microsoft Health Futures, specifically designed for encoding chest X-rays using self-supervised learning. Built upon the DINOv2 architecture, this model represents a significant advancement in medical image processing, trained on an extensive dataset of nearly 900,000 chest X-rays from five major public datasets.

Implementation Details

The model utilizes a vision transformer architecture fine-tuned from dinov2-base, implementing advanced features for medical image analysis. Training was conducted using 16 nodes with 4 A100 GPUs each, running for 35,000 iterations with a batch size of 40 images per GPU.

  • Trained on combined datasets including MIMIC-CXR, CheXpert, NIH-CXR, PadChest, and BRAX
  • Uses fp16 mixed-precision training with PyTorch-FSDP
  • Implements sophisticated image preprocessing including B-spline interpolation

Core Capabilities

  • Image classification using CLS token
  • Image segmentation via patch tokens
  • Clustering and image retrieval
  • Report generation capabilities
  • Zero-shot transfer learning

Frequently Asked Questions

Q: What makes this model unique?

RAD-DINO's uniqueness lies in its specialized training on chest X-rays and its ability to perform various downstream tasks without requiring fine-tuning, making it highly versatile for medical imaging applications.

Q: What are the recommended use cases?

The model is designed for research purposes only and should not be used in clinical practice. It excels in tasks such as image classification, segmentation, clustering, and retrieval within a research context.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.