Stanford De-identifier for Radiology Reports
Property | Value |
---|---|
License | MIT |
Framework | PyTorch + Transformers |
Base Architecture | PubMedBERT (uncased) |
Primary Task | Token Classification |
What is stanford-deidentifier-only-radiology-reports?
This is a specialized AI model developed by Stanford AIMI for automatically de-identifying sensitive information in radiology reports. It combines transformer-based architecture with "hide in plain sight" rule-based methods to detect and replace protected health information (PHI) while maintaining document readability.
Implementation Details
The model was trained on a diverse dataset of 6,193 documents, including chest X-ray and CT reports, achieving remarkable F1 scores: 97.9 on known institution reports, 99.6 on new institution reports, and high performance on i2b2 benchmarks. It utilizes PubMedBERT as its foundation and implements sophisticated token classification techniques.
- Built on PubMedBERT architecture with specialized training for medical text
- Combines transformer learning with rule-based methods
- Trained on multi-institutional data for robust performance
- Implements synthetic PHI generation for enhanced training
Core Capabilities
- Accurate detection and replacement of PHI in medical documents
- Cross-institutional compatibility
- Superior performance compared to existing de-identification tools
- Realistic surrogate replacement for removed PHI
- 99.1% recall in detecting core PHI spans
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its hybrid approach combining transformers with rule-based methods, achieving state-of-the-art performance that exceeds both existing tools and human labelers on standard benchmarks.
Q: What are the recommended use cases?
The model is specifically designed for de-identifying radiology reports and other medical documents in clinical and research settings where maintaining patient privacy is crucial while preserving document utility.