stanford-deidentifier-base

stanford-deidentifier-base

StanfordAIMI

An advanced de-identification model for medical documents, specializing in radiology reports with 97.9+ F1 score performance, built on transformer architecture.

PropertyValue
LicenseMIT
FrameworkPyTorch, Transformers
DomainRadiology, Biomedical
PaperView Research Paper

What is stanford-deidentifier-base?

Stanford-deidentifier-base is a sophisticated machine learning model designed to automatically remove protected health information (PHI) from medical documents, particularly radiology reports. Developed by StanfordAIMI, this model achieves exceptional performance with F1 scores of 97.9+ on various test sets, making it suitable for production environments.

Implementation Details

The model implements a transformer-based architecture, specifically built on PubMedBERT (uncased), and combines both transformer and rule-based methods for optimal de-identification. It was trained on a diverse dataset of 6,193 documents, including chest X-ray reports, CT scans, and medical notes from multiple institutions.

  • Built on PubMedBERT architecture
  • Trained on multi-institutional dataset
  • Implements token classification for PHI detection
  • Includes synthetic PHI generation capabilities

Core Capabilities

  • Achieves 97.9 F1 score on known institution reports
  • 99.6 F1 score on new institution reports
  • 99.5 F1 score on i2b2 2006 dataset
  • 98.9 F1 score on i2b2 2014 dataset
  • Automatic replacement of PHI with realistic surrogates

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its combination of transformer-based learning and "hide in plain sight" rule-based methods, achieving state-of-the-art performance that exceeds both existing tools and human labelers on i2b2 2014 data.

Q: What are the recommended use cases?

The model is specifically designed for de-identifying radiology reports and other medical documents in production environments where high accuracy is crucial. It's particularly effective for healthcare institutions needing to process large volumes of medical documents while maintaining patient privacy.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026