videomae-base

videomae-base

MCG-NJU

VideoMAE base model with 94.2M params for self-supervised video pre-training. Uses masked autoencoding on Kinetics-400 dataset with ViT architecture.

PropertyValue
Parameter Count94.2M
LicenseCC-BY-NC-4.0
PaperVideoMAE Paper
FrameworkPyTorch
Tensor TypeF32

What is videomae-base?

VideoMAE-base is a self-supervised video pre-training model that extends the Masked Autoencoder (MAE) approach to video processing. Developed by MCG-NJU, this model has been pre-trained on the Kinetics-400 dataset for 1600 epochs, utilizing a Vision Transformer (ViT) architecture to process video data effectively.

Implementation Details

The model processes videos as sequences of 16x16 fixed-size patches with a unique approach to masked prediction. It incorporates a [CLS] token for classification tasks and utilizes sinus/cosinus position embeddings. The architecture consists of a Transformer encoder with a specialized decoder designed for predicting pixel values in masked patches.

  • Self-supervised pre-training approach
  • Vision Transformer-based architecture
  • 16x16 patch-based video processing
  • Integrated [CLS] token for classification

Core Capabilities

  • Video feature extraction and representation learning
  • Masked patch prediction for self-supervised learning
  • Fine-tunable for downstream video classification tasks
  • Efficient processing of video temporal information

Frequently Asked Questions

Q: What makes this model unique?

VideoMAE's uniqueness lies in its efficient self-supervised learning approach for video understanding, requiring no manual annotations during pre-training while achieving strong performance through masked autoencoding.

Q: What are the recommended use cases?

The model is primarily designed for video understanding tasks, particularly after fine-tuning. It's suitable for video classification, feature extraction, and can be adapted for various video analysis tasks through transfer learning.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026