discogs-maest-30s-pw-129e

Maintained By
mtg-upf

discogs-maest-30s-pw-129e

PropertyValue
LicenseCC BY-NC-SA 4.0
FrameworkPyTorch / Transformers
PaperEfficient Supervised Training of Audio Transformers
Training DataDiscogs20 Dataset (3.3M tracks)

What is discogs-maest-30s-pw-129e?

MAEST (Music Audio Encoder Spectrogram Transformer) is a specialized Transformer model designed for music analysis applications. Built upon the PASST architecture, this model has been trained on a massive dataset of 3.3M music tracks from Discogs, enabling it to classify music into 400 different styles. The model processes audio through mel-spectrograms and is particularly effective at understanding musical characteristics and patterns.

Implementation Details

The model utilizes the Audio Spectrogram Transformer architecture and is implemented in PyTorch. It processes 30-second audio clips at 16kHz sampling rate and can be easily integrated using the Transformers library's audio-classification pipeline. The training process involved 4 Nvidia RTX 2080 Ti GPUs with approximately 32 hours of training time.

  • Pre-processes audio into mel-spectrograms using Essentia-compatible processing
  • Operates on 16kHz audio input
  • Integrates with 🤗 Transformers using custom code (requires trust_remote_code=True)

Core Capabilities

  • Music style classification across 400 categories
  • Feature extraction for downstream music analysis tasks
  • Music genre recognition
  • Emotion recognition in music
  • Instrument detection

Frequently Asked Questions

Q: What makes this model unique?

MAEST stands out for its specialized focus on music understanding, trained on one of the largest music metadata datasets (Discogs20). Unlike general-purpose audio models, it's specifically optimized for music analysis tasks and provides strong performance in music-specific applications.

Q: What are the recommended use cases?

The model is best suited for music analysis tasks, particularly style classification, genre recognition, and feature extraction for downstream music understanding tasks. It's important to note that it's NOT designed for general audio classification tasks like AudioSet-style recognition.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.