voice-safety-classifier

Maintained By
Roblox

Voice Safety Classifier

PropertyValue
AuthorRoblox
Base ModelWavLM base plus
Training Data2,374 hours of voice chat
Model URLHuggingFace

What is voice-safety-classifier?

The voice-safety-classifier is a sophisticated AI model developed by Roblox for detecting and classifying toxic content in voice chat communications. Built upon the WavLM base plus architecture, this model represents a significant advancement in audio content moderation, capable of identifying multiple categories of policy violations simultaneously.

Implementation Details

The model was trained on an extensive dataset of 2,374 hours of voice chat audio clips, utilizing a synthetic data pipeline for multilabel classification. It produces outputs across six distinct categories: Profanity, DatingAndSexting, Racist, Bullying, Other, and NoViolation. The model achieves an impressive 94.48% binarized average precision across all toxicity classes.

  • Built on WavLM base plus architecture
  • Outputs n by 6 classification tensor
  • Evaluated on 9,795 human-annotated samples
  • Supports multiple violation detection simultaneously

Core Capabilities

  • Profanity detection (49.95% of evaluation dataset)
  • Dating and sexting identification (7.02% of dataset)
  • Racist content detection (9.08% of dataset)
  • Bullying recognition (12.82% of dataset)
  • Clean content verification (42.73% of dataset)
  • Other policy violations detection

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive approach to voice chat moderation, utilizing a large-scale manually curated dataset and achieving high precision across multiple violation categories. Its ability to handle multilabel classification makes it particularly valuable for real-world applications.

Q: What are the recommended use cases?

The model is ideal for real-time voice chat moderation in gaming platforms, online communities, and educational environments where content safety is crucial. It can be implemented for automatic content filtering, user protection, and policy enforcement systems.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.