Stanford De-identifier for Radiology Reports

Property	Value
License	MIT
Framework	PyTorch + Transformers
Base Architecture	PubMedBERT (uncased)
Primary Task	Token Classification

What is stanford-deidentifier-only-radiology-reports?

This is a specialized AI model developed by Stanford AIMI for automatically de-identifying sensitive information in radiology reports. It combines transformer-based architecture with "hide in plain sight" rule-based methods to detect and replace protected health information (PHI) while maintaining document readability.

Implementation Details

The model was trained on a diverse dataset of 6,193 documents, including chest X-ray and CT reports, achieving remarkable F1 scores: 97.9 on known institution reports, 99.6 on new institution reports, and high performance on i2b2 benchmarks. It utilizes PubMedBERT as its foundation and implements sophisticated token classification techniques.

Built on PubMedBERT architecture with specialized training for medical text
Combines transformer learning with rule-based methods
Trained on multi-institutional data for robust performance
Implements synthetic PHI generation for enhanced training

Core Capabilities

Accurate detection and replacement of PHI in medical documents
Cross-institutional compatibility
Superior performance compared to existing de-identification tools
Realistic surrogate replacement for removed PHI
99.1% recall in detecting core PHI spans

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its hybrid approach combining transformers with rule-based methods, achieving state-of-the-art performance that exceeds both existing tools and human labelers on standard benchmarks.

Q: What are the recommended use cases?

The model is specifically designed for de-identifying radiology reports and other medical documents in clinical and research settings where maintaining patient privacy is crucial while preserving document utility.

stanford-deidentifier-only-radiology-reports