en_spacy_pii_distilbert

Property	Value
Author	Benjamin Kilimnik
License	MIT
Framework	spaCy ≥3.4.1, ≤3.8.2
Performance	95.42% F-score

What is en_spacy_pii_distilbert?

en_spacy_pii_distilbert is a specialized Named Entity Recognition (NER) model built on DistilBERT architecture for detecting Personal Identifiable Information (PII) in English text. Developed by Benjamin Kilimnik, this model achieves impressive accuracy with a 95.42% F-score, making it particularly effective for privacy-sensitive applications.

Implementation Details

The model is implemented using spaCy's pipeline architecture, featuring two main components: a transformer and NER module. It's trained on a custom dataset specifically designed for structured PII detection, developed using the Privy framework.

Default Pipeline: transformer, ner
Components: transformer, ner
Entity Labels: DATE_TIME, LOC, NRP, ORG, PER
Performance Metrics: Precision (95.30%), Recall (95.54%), F-score (95.42%)

Core Capabilities

Detection of temporal information (DATE_TIME)
Location identification (LOC)
Organization detection (ORG)
Personal name recognition (PER)
Non-specific personal information (NRP)

Frequently Asked Questions

Q: What makes this model unique?

This model's specialization in PII detection, combined with its high accuracy and comprehensive entity coverage, makes it particularly valuable for privacy and compliance applications. It's built on a custom dataset specifically designed for structured PII detection.

Q: What are the recommended use cases?

The model is ideal for privacy compliance, data anonymization, personal information redaction, and automated PII detection in large text datasets. It's particularly useful in applications requiring high-accuracy identification of personal information.