videomae-base-finetuned-kinetics

videomae-base-finetuned-kinetics

MCG-NJU

VideoMAE base model fine-tuned on Kinetics-400 for video classification. 86.5M params, achieves 80.9% top-1 accuracy. Built on MAE architecture.

PropertyValue
Parameter Count86.5M
LicenseCC-BY-NC-4.0
PaperVideoMAE Paper
Accuracy80.9% Top-1, 94.7% Top-5
FrameworkPyTorch

What is videomae-base-finetuned-kinetics?

VideoMAE is an advanced video classification model that extends the Masked Autoencoder (MAE) architecture to video processing. This particular model has been pre-trained for 1600 epochs using self-supervised learning and then fine-tuned on the Kinetics-400 dataset, making it particularly effective for video classification tasks.

Implementation Details

The model processes videos as sequences of 16x16 fixed-size patches with linear embedding. It utilizes a Vision Transformer (ViT) architecture with a specialized decoder for predicting pixel values of masked patches. A [CLS] token is added at the sequence start for classification tasks, along with positional embeddings.

  • Transformer-based architecture with specialized video processing capabilities
  • Pre-trained using masked autoencoding technique
  • Fine-tuned on Kinetics-400 dataset
  • Supports 400 different video classification labels

Core Capabilities

  • High-accuracy video classification (80.9% top-1 accuracy)
  • Efficient processing of video sequences
  • Feature extraction for downstream tasks
  • Robust representation learning through masked prediction

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines masked autoencoding with video processing, achieving state-of-the-art results while being data-efficient. Its architecture is specifically designed to handle the temporal aspects of video data.

Q: What are the recommended use cases?

The model is ideal for video classification tasks, particularly those involving action recognition within the Kinetics-400 categories. It can also be used for feature extraction in custom video analysis pipelines.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026