convnextv2_base.fcmae_ft_in22k_in1k_384

convnextv2_base.fcmae_ft_in22k_in1k_384

timm

ConvNeXt V2 base model trained with FCMAE, fine-tuned on ImageNet-22k/1k. 88.7M params, 384x384 input, 87.6% top-1 accuracy.

PropertyValue
Parameter Count88.7M
Model TypeImage Classification / Feature Backbone
Input Resolution384 x 384
Top-1 Accuracy87.646%
GMACs45.21
PaperConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

What is convnextv2_base.fcmae_ft_in22k_in1k_384?

This is a state-of-the-art convolutional neural network that represents the base variant of the ConvNeXt V2 architecture. It was pretrained using a fully convolutional masked autoencoder (FCMAE) framework and subsequently fine-tuned on ImageNet-22k and ImageNet-1k datasets. The model operates on 384x384 pixel images and achieves an impressive 87.646% top-1 accuracy.

Implementation Details

The model features a sophisticated architecture with 88.7M parameters and requires 45.2 GMACs (billion multiply-accumulate operations) for inference. It maintains 84.5M activations during processing and delivers efficient performance with 209.51 samples per second at a batch size of 256.

  • Advanced FCMAE pretraining methodology
  • Hierarchical feature extraction capabilities
  • Optimized for 384x384 resolution inputs
  • Dual-stage fine-tuning on ImageNet-22k and ImageNet-1k

Core Capabilities

  • High-accuracy image classification
  • Feature map extraction at multiple scales
  • Image embedding generation
  • Transfer learning applications

Frequently Asked Questions

Q: What makes this model unique?

This model combines the innovative ConvNeXt V2 architecture with FCMAE pretraining, offering an excellent balance between performance (87.646% top-1 accuracy) and efficiency (209.51 samples/sec). It's particularly notable for its ability to process high-resolution 384x384 images while maintaining strong performance.

Q: What are the recommended use cases?

The model excels in image classification tasks, feature extraction, and as a backbone for transfer learning. It's particularly well-suited for applications requiring high-resolution image processing and where both accuracy and efficiency are important considerations.

Socials
Integrations
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026