RAD-DINO

Property	Value
Parameter Count	86.6M
License	MSRLA
Paper	RAD-DINO: Exploring Scalable Medical Image Encoders
Training Data	882,775 chest X-rays

What is RAD-DINO?

RAD-DINO is a sophisticated vision transformer model developed by Microsoft Health Futures, specifically designed for encoding chest X-rays using self-supervised learning. Built upon the DINOv2 architecture, this model represents a significant advancement in medical image processing, trained on an extensive dataset of nearly 900,000 chest X-rays from five major public datasets.

Implementation Details

The model utilizes a vision transformer architecture fine-tuned from dinov2-base, implementing advanced features for medical image analysis. Training was conducted using 16 nodes with 4 A100 GPUs each, running for 35,000 iterations with a batch size of 40 images per GPU.

Trained on combined datasets including MIMIC-CXR, CheXpert, NIH-CXR, PadChest, and BRAX
Uses fp16 mixed-precision training with PyTorch-FSDP
Implements sophisticated image preprocessing including B-spline interpolation

Core Capabilities

Image classification using CLS token
Image segmentation via patch tokens
Clustering and image retrieval
Report generation capabilities
Zero-shot transfer learning

Frequently Asked Questions

Q: What makes this model unique?

RAD-DINO's uniqueness lies in its specialized training on chest X-rays and its ability to perform various downstream tasks without requiring fine-tuning, making it highly versatile for medical imaging applications.

Q: What are the recommended use cases?

The model is designed for research purposes only and should not be used in clinical practice. It excels in tasks such as image classification, segmentation, clustering, and retrieval within a research context.

rad-dino