ast-finetuned-speech-commands-v2

ast-finetuned-speech-commands-v2

MIT

AST model fine-tuned on Speech Commands v2 dataset, achieving 98.12% accuracy. Based on Vision Transformer architecture for audio classification. 85.4M parameters.

PropertyValue
Parameter Count85.4M
LicenseBSD-3-Clause
PaperAST: Audio Spectrogram Transformer
Accuracy98.12%

What is ast-finetuned-speech-commands-v2?

The Audio Spectrogram Transformer (AST) is an innovative model that adapts the Vision Transformer (ViT) architecture for audio classification tasks. This particular version has been fine-tuned on the Speech Commands v2 dataset, achieving remarkable accuracy of 98.12%. The model represents a novel approach to audio processing by treating spectrograms as images.

Implementation Details

AST operates by converting audio inputs into spectrograms, which are then processed using a transformer-based architecture similar to ViT. The model utilizes F32 tensor types and comprises 85.4M parameters, making it a substantial but manageable model for audio classification tasks.

  • Transforms audio into spectrogram representations
  • Employs Vision Transformer architecture for processing
  • Implements state-of-the-art audio classification techniques
  • Uses PyTorch framework with Safetensors support

Core Capabilities

  • High-accuracy speech command classification
  • Robust audio feature extraction
  • Efficient spectrogram processing
  • State-of-the-art performance on audio classification benchmarks

Frequently Asked Questions

Q: What makes this model unique?

This model is unique in its approach to treating audio classification as a vision task by processing spectrograms through a transformer architecture, achieving exceptional accuracy while maintaining processing efficiency.

Q: What are the recommended use cases?

The model is specifically designed for speech command recognition and audio classification tasks. It's particularly well-suited for applications requiring precise identification of spoken commands in controlled environments.

Socials
Integrations
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026