Domain Classifier
Property | Value |
---|---|
Parameter Count | 184M |
Model Type | DeBERTa-V3 Base |
License | Apache 2.0 |
Context Length | 512 tokens |
Performance | 0.9873 PR-AUC |
What is domain-classifier?
The domain-classifier is a sophisticated text classification model developed by NVIDIA that can categorize documents into 26 distinct domains. Built on the DeBERTa-V3 Base architecture, this model excels at understanding and classifying content ranging from Arts and Entertainment to Travel and Transportation, making it an invaluable tool for content organization and analysis.
Implementation Details
The model was trained on a diverse dataset comprising 1 million Common Crawl samples and 500,000 Wikipedia articles, labeled using Google Cloud's Natural Language API and Wikipedia-API. The implementation leverages the powerful DeBERTa-V3 architecture with a context length of 512 tokens, allowing it to process substantial text segments effectively.
- Transformer-based architecture with 184M parameters
- Supports 26 distinct domain categories
- Implements PyTorch Model Hub integration
- Uses F32 tensor type for computations
Core Capabilities
- High-accuracy domain classification with 0.9873 PR-AUC score
- Exceptional performance across all domains (0.918-0.999 PR-AUC range)
- Handles various content types from technical to creative domains
- Easy integration with both NeMo Curator and Transformers pipeline
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its comprehensive domain coverage and exceptional accuracy, particularly its ability to maintain high PR-AUC scores across all 26 domains. The extensive training dataset and use of DeBERTa-V3 architecture ensure robust performance in real-world applications.
Q: What are the recommended use cases?
The model is ideal for content categorization in large-scale document management systems, content recommendation engines, and automated content organization platforms. It's particularly valuable for organizations needing to automatically classify web content, articles, or documents into specific domains.