CLIP-ViT-L-14-DataComp.XL-s13B-b90K

CLIP-ViT-L-14-DataComp.XL-s13B-b90K

laion

Advanced CLIP model trained on DataComp-1B dataset, achieving 79.2% zero-shot accuracy on ImageNet-1k. Optimized for research and zero-shot image classification.

PropertyValue
LicenseMIT
Research PaperDataComp Paper
Training DataDataComp-1B (1.4B samples)
ImageNet-1k Accuracy79.2% (zero-shot)

What is CLIP-ViT-L-14-DataComp.XL-s13B-b90K?

This is an advanced implementation of the CLIP (Contrastive Language-Image Pre-training) model, specifically using a Vision Transformer Large/14 architecture. Trained on the massive DataComp-1B dataset, it represents a significant advancement in zero-shot image classification and multi-modal learning. The model was trained on stability.ai's infrastructure and demonstrates state-of-the-art performance in various image understanding tasks.

Implementation Details

The model utilizes the OpenCLIP framework and incorporates a ViT-L/14 architecture trained on carefully curated data from the DataComp project. It's designed for research applications and demonstrates exceptional zero-shot classification capabilities.

  • Trained on 1.4 billion samples from DataComp-1B dataset
  • Implements Vision Transformer Large/14 architecture
  • Achieves 79.2% zero-shot accuracy on ImageNet-1k
  • Extensively evaluated on 38 different datasets

Core Capabilities

  • Zero-shot image classification
  • Image and text retrieval
  • Foundation for downstream task fine-tuning
  • Image generation guidance and conditioning

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its training on the carefully curated DataComp-1B dataset and its impressive zero-shot classification performance. The combination of the ViT-L/14 architecture with advanced training methodologies makes it particularly effective for research applications.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, particularly in zero-shot image classification and multi-modal learning research. It's not recommended for deployment in production environments without thorough testing and evaluation. Specific use cases include research in image classification, retrieval systems, and foundation model studies.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026