Domain Classifier

Property	Value
Parameter Count	184M
Model Type	DeBERTa-V3 Base
License	Apache 2.0
Context Length	512 tokens
Performance	0.9873 PR-AUC

What is domain-classifier?

The domain-classifier is a sophisticated text classification model developed by NVIDIA that can categorize documents into 26 distinct domains. Built on the DeBERTa-V3 Base architecture, this model excels at understanding and classifying content ranging from Arts and Entertainment to Travel and Transportation, making it an invaluable tool for content organization and analysis.

Implementation Details

The model was trained on a diverse dataset comprising 1 million Common Crawl samples and 500,000 Wikipedia articles, labeled using Google Cloud's Natural Language API and Wikipedia-API. The implementation leverages the powerful DeBERTa-V3 architecture with a context length of 512 tokens, allowing it to process substantial text segments effectively.

Transformer-based architecture with 184M parameters
Supports 26 distinct domain categories
Implements PyTorch Model Hub integration
Uses F32 tensor type for computations

Core Capabilities

High-accuracy domain classification with 0.9873 PR-AUC score
Exceptional performance across all domains (0.918-0.999 PR-AUC range)
Handles various content types from technical to creative domains
Easy integration with both NeMo Curator and Transformers pipeline

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive domain coverage and exceptional accuracy, particularly its ability to maintain high PR-AUC scores across all 26 domains. The extensive training dataset and use of DeBERTa-V3 architecture ensure robust performance in real-world applications.

Q: What are the recommended use cases?

The model is ideal for content categorization in large-scale document management systems, content recommendation engines, and automated content organization platforms. It's particularly valuable for organizations needing to automatically classify web content, articles, or documents into specific domains.