gliner_multi_pii-v1
Property | Value |
---|---|
License | Apache 2.0 |
Languages Supported | English, French, German, Spanish, Portuguese, Italian |
Framework | PyTorch |
Paper | arxiv:2311.08526 |
What is gliner_multi_pii-v1?
gliner_multi_pii-v1 is a specialized Named Entity Recognition (NER) model designed for identifying personally identifiable information (PII) across multiple languages. Built on GLiNER architecture, it offers a resource-efficient alternative to large language models while maintaining flexibility in entity recognition.
Implementation Details
The model is fine-tuned from urchade/gliner_multi-v2.1 using the synthetic-pii-ner-mistral-v1 dataset. It employs a bidirectional transformer encoder architecture similar to BERT, optimized for token classification tasks.
- Supports 6 major European languages
- Capable of identifying 40+ types of PII entities
- Implements efficient token classification pipeline
- Utilizes GLiNER architecture for flexible entity recognition
Core Capabilities
- Multi-language PII detection (EN, FR, DE, ES, PT, IT)
- Recognition of personal identifiers (SSN, passport numbers, etc.)
- Contact information detection (email, phone numbers)
- Financial information identification (credit card numbers, bank accounts)
- Medical data recognition (health insurance IDs, medical conditions)
Frequently Asked Questions
Q: What makes this model unique?
This model combines the flexibility of modern NER approaches with resource efficiency, allowing it to identify a wide range of PII entities without requiring the computational resources of larger language models.
Q: What are the recommended use cases?
The model is ideal for data privacy compliance, document processing, and information security applications where identifying and protecting personal information across multiple languages is crucial.