videomae-large

videomae-large

MCG-NJU

VideoMAE large - A 343M parameter video transformer model for masked autoencoding, pre-trained on Kinetics-400 for self-supervised learning

PropertyValue
Parameter Count343M
LicenseCC-BY-NC-4.0
PaperVideoMAE Paper
FrameworkPyTorch

What is videomae-large?

VideoMAE-large is an advanced self-supervised learning model designed for video understanding tasks. It extends the Masked Autoencoder (MAE) approach to video processing, utilizing a large-scale architecture with 343M parameters. Pre-trained on the Kinetics-400 dataset for 1600 epochs, it represents a significant advancement in video representation learning.

Implementation Details

The model processes videos as sequences of 16x16 fixed-size patches, incorporating a Vision Transformer (ViT) architecture with additional decoder capabilities. It utilizes a [CLS] token for classification tasks and employs sinus/cosinus position embeddings.

  • Large-scale architecture with 343M parameters
  • Self-supervised pre-training on Kinetics-400
  • 16x16 patch-based video processing
  • Transformer-based encoding with specialized decoder

Core Capabilities

  • Masked video patch prediction
  • Feature extraction for downstream tasks
  • Video representation learning
  • Transfer learning potential for various video tasks

Frequently Asked Questions

Q: What makes this model unique?

VideoMAE-large stands out for its self-supervised learning approach that doesn't require labeled data for pre-training, making it highly efficient for video understanding tasks. Its large parameter count and specialized architecture enable robust feature learning from masked video content.

Q: What are the recommended use cases?

The model is primarily designed for video understanding tasks and can be fine-tuned for specific applications like action recognition, video classification, and feature extraction. It's particularly useful when working with large video datasets that require sophisticated feature learning.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026