llama-ai4privacy-english-anonymiser-openpii

Maintained By
ai4privacy

Llama AI4Privacy English Anonymiser OpenPII

PropertyValue
LicenseMIT
Authorai4privacy
Model URLHuggingFace

What is llama-ai4privacy-english-anonymiser-openpii?

This is a specialized AI model designed for identifying and redacting Personally Identifiable Information (PII) from English text. Fine-tuned on the English subset of the open-pii-masking-500k-ai4privacy dataset, it demonstrates exceptional accuracy in identifying 20 different categories of PII, achieving an overall accuracy of 99.17% with perfect precision scores across most categories.

Implementation Details

The model excels in identifying various PII elements including names, identification numbers, contact information, and location data. With 30,848 true positives and minimal false positives (368) and false negatives (366), it demonstrates robust performance across all PII categories.

  • Overall F1 Score: 98.82%
  • Macro-Averaged Accuracy: 98.56%
  • Perfect precision (100%) across all PII categories
  • Exceptional performance in passport numbers and dates (100% accuracy)

Core Capabilities

  • Identifies and redacts 20 distinct PII categories
  • Particularly strong in detecting numerical identifiers (passports, credit cards)
  • High accuracy in detecting personal names (97.89% for given names, 99.31% for surnames)
  • Excellent performance in contact information (99.96% for email, 99.89% for telephone numbers)

Frequently Asked Questions

Q: What makes this model unique?

The model's perfect precision across all PII categories and its comprehensive coverage of 20 different PII types make it exceptionally reliable for text anonymization tasks. Its performance metrics show negligible false positives, making it highly trustworthy for production environments.

Q: What are the recommended use cases?

The model is specifically designed for text redaction in English-language documents. It's ideal for organizations needing to automatically anonymize documents, comply with privacy regulations, or process large volumes of text containing sensitive personal information. However, users should test it thoroughly on their specific use cases before deployment.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.