AST-VoxCelebSpoof-Synthetic-Voice-Detection

AST-VoxCelebSpoof-Synthetic-Voice-Detection

MattyB95

Audio transformer model for synthetic voice detection with 86.2M parameters. Achieves 99.99% accuracy in detecting AI-generated voices. Based on MIT's AST architecture.

PropertyValue
Parameter Count86.2M
LicenseMIT
Base ModelMIT/ast-finetuned-audioset-10-10-0.4593
Performance MetricsAccuracy: 99.99%, F1: 0.9999, Precision: 1.0

What is AST-VoxCelebSpoof-Synthetic-Voice-Detection?

This is a specialized audio transformer model designed to detect synthetic or AI-generated voices with exceptional accuracy. Built upon MIT's Audio Spectrogram Transformer (AST) architecture, it has been fine-tuned on the VoxCelebSpoof dataset to achieve nearly perfect detection capabilities.

Implementation Details

The model utilizes a transformer-based architecture with 86.2M parameters, trained using the Adam optimizer with carefully tuned hyperparameters (learning rate: 5e-05, beta values: 0.9, 0.999). The training process spanned 3 epochs with a batch size of 8, implementing a linear learning rate scheduler.

  • F32 tensor precision for optimal performance
  • Integrated TensorBoard support for monitoring
  • Safetensors implementation for secure model loading
  • Inference endpoints available for production deployment

Core Capabilities

  • Near-perfect synthetic voice detection (99.99% accuracy)
  • Robust performance with 1.0 precision score
  • Excellent recall rate of 0.9998
  • Specialized in English language audio processing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional accuracy in detecting synthetic voices, achieving near-perfect results with 99.99% accuracy and perfect precision scores. It's built on the proven AST architecture and optimized specifically for synthetic voice detection.

Q: What are the recommended use cases?

The model is ideal for: content authenticity verification, digital forensics, media authentication systems, and automated synthetic voice detection in security applications. It's particularly suited for English language audio analysis.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026