multilingual-IPTC-news-topic-classifier

Maintained By
classla

multilingual-IPTC-news-topic-classifier

PropertyValue
Base ArchitectureXLM-RoBERTa-large
Languages SupportedCroatian, Slovenian, Catalan, Greek (and others supported by XLM-RoBERTa)
PerformanceMicro-F1: 0.734, Macro-F1: 0.746
PaperIEEE Access 2025

What is multilingual-IPTC-news-topic-classifier?

This is a specialized news classification model designed to categorize news content into 17 IPTC Media Topic categories. Built on XLM-RoBERTa-large, it was fine-tuned on a dataset of 15,000 news articles in multiple languages. The model achieves impressive accuracy, particularly when used with confidence thresholds above 0.90, where it reaches 0.80 F1-scores.

Implementation Details

The model was trained using the teacher-student framework, where GPT-4 served as the teacher for initial annotation. It processes texts with a maximum length of 512 tokens and requires a minimum of 75 words for reliable classification.

  • Trained on EMMediaTopic 1.0 dataset with 15,000 news articles
  • Uses simpletransformers with optimized hyperparameters
  • Supports 17 distinct IPTC categories including sports, politics, arts, and more
  • Achieves best results with confidence threshold ≥ 0.90

Core Capabilities

  • Multilingual news classification across 17 IPTC categories
  • High accuracy (0.734) across multiple languages
  • Confidence-based filtering for improved precision
  • Handles diverse news topics from sports to environmental issues

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines multilingual capabilities with specialized IPTC news topic classification, achieving performance that surpasses GPT-4 in zero-shot settings for specific languages.

Q: What are the recommended use cases?

The model is ideal for automated news categorization, content organization, and media monitoring applications. It performs best with texts containing at least 75 words and when using confidence thresholds of 0.90 or higher.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.