CLIP-convnext_base_w-laion_aesthetic-s13B-b82K

Maintained By
laion

CLIP-convnext_base_w-laion_aesthetic-s13B-b82K

PropertyValue
Training DataLAION-Aesthetic (900M samples)
ArchitectureConvNeXt-Base with wide embed dim
Resolution256x256
ImageNet Zero-Shot Accuracy71.0%
Model SourceHugging Face

What is CLIP-convnext_base_w-laion_aesthetic-s13B-b82K?

This model represents a significant advancement in CLIP architecture, utilizing ConvNeXt-Base as its image tower instead of the traditional ViT or ResNet approaches. Trained on a carefully curated subset of LAION-2B with aesthetic filtering, it demonstrates impressive performance in zero-shot image classification tasks.

Implementation Details

The model employs a ConvNeXt-Base architecture with wide embedding dimensions, trained for 13B samples with a batch size of 81920. It uses Random Resize Crop (0.9, 1.0) for augmentation and achieves state-of-the-art performance for its class.

  • Trained on LAION-Aesthetic dataset (900M samples)
  • Uses advanced augmentation techniques
  • Optimized for 256x256 resolution
  • Implements gradient checkpointing for efficient training

Core Capabilities

  • Zero-shot image classification with 71.0% accuracy on ImageNet
  • Image and text retrieval tasks
  • Support for cross-modal understanding
  • Efficient scaling with model size and resolution

Frequently Asked Questions

Q: What makes this model unique?

This is one of the first ConvNeXt CLIP models trained at scale, offering an alternative to ViT and ResNet architectures while achieving superior sample efficiency compared to ViT-B/16.

Q: What are the recommended use cases?

The model is best suited for research purposes, including zero-shot classification, image-text retrieval, and as a foundation for fine-tuning specific tasks. However, it's not recommended for deployment in production environments without thorough testing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.