en_spacy_pii_distilbert

Maintained By
beki

en_spacy_pii_distilbert

PropertyValue
AuthorBenjamin Kilimnik
LicenseMIT
FrameworkspaCy ≥3.4.1, ≤3.8.2
Performance95.42% F-score

What is en_spacy_pii_distilbert?

en_spacy_pii_distilbert is a specialized Named Entity Recognition (NER) model built on DistilBERT architecture for detecting Personal Identifiable Information (PII) in English text. Developed by Benjamin Kilimnik, this model achieves impressive accuracy with a 95.42% F-score, making it particularly effective for privacy-sensitive applications.

Implementation Details

The model is implemented using spaCy's pipeline architecture, featuring two main components: a transformer and NER module. It's trained on a custom dataset specifically designed for structured PII detection, developed using the Privy framework.

  • Default Pipeline: transformer, ner
  • Components: transformer, ner
  • Entity Labels: DATE_TIME, LOC, NRP, ORG, PER
  • Performance Metrics: Precision (95.30%), Recall (95.54%), F-score (95.42%)

Core Capabilities

  • Detection of temporal information (DATE_TIME)
  • Location identification (LOC)
  • Organization detection (ORG)
  • Personal name recognition (PER)
  • Non-specific personal information (NRP)

Frequently Asked Questions

Q: What makes this model unique?

This model's specialization in PII detection, combined with its high accuracy and comprehensive entity coverage, makes it particularly valuable for privacy and compliance applications. It's built on a custom dataset specifically designed for structured PII detection.

Q: What are the recommended use cases?

The model is ideal for privacy compliance, data anonymization, personal information redaction, and automated PII detection in large text datasets. It's particularly useful in applications requiring high-accuracy identification of personal information.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.