russian-sensitive-topics

Maintained By
apanc

russian-sensitive-topics

PropertyValue
LicenseCC BY-NC-SA 4.0
PaperResearch Paper
LanguageRussian
FrameworkPyTorch, Transformers

What is russian-sensitive-topics?

russian-sensitive-topics is a specialized text classification model designed to detect 18 different sensitive topics in Russian text. Developed by researchers at Skoltech NLP, the model addresses the challenge of identifying potentially inappropriate content that could harm a company's reputation. The model was trained on both manually and semi-automatically labeled data, making it robust for real-world applications.

Implementation Details

The model utilizes BERT architecture and is implemented using PyTorch and the Transformers library. It achieves notable performance across various sensitive topics, with particularly strong results in categories like drugs (F1: 0.88), weapons (F1: 0.86), and religion (F1: 0.81). The model demonstrates balanced precision and recall metrics across most categories.

  • Supports 18 distinct sensitive topics including offline/online crime, discrimination, and social issues
  • Trained on an extended dataset available on GitHub and Kaggle
  • Implements multi-label classification for comprehensive content analysis

Core Capabilities

  • Multi-label classification of sensitive topics in Russian text
  • Detection of potentially harmful content across various domains
  • Balanced performance across different sensitive categories
  • Specialized handling of nuanced content related to social issues

Frequently Asked Questions

Q: What makes this model unique?

The model's unique approach lies in its fine-grained classification of inappropriate content that goes beyond simple toxicity detection. It specifically focuses on content that could harm reputation while considering topic sensitivity.

Q: What are the recommended use cases?

The model is ideal for content moderation systems, corporate communication monitoring, and social media analysis where identifying potentially sensitive or inappropriate content in Russian text is crucial. It's particularly valuable for maintaining brand reputation and ensuring appropriate content guidelines.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.