CLIP-ViT-bigG-14-laion2B-39B-b160k

CLIP-ViT-bigG-14-laion2B-39B-b160k

laion

CLIP ViT-bigG/14 model trained on LAION-2B dataset achieving 80.1% ImageNet accuracy. Specialized in zero-shot image classification and retrieval tasks.

PropertyValue
LicenseMIT
Training DataLAION-2B English subset
ImageNet Accuracy80.1% (Zero-shot)
FrameworkOpenCLIP, PyTorch
Primary TasksZero-Shot Image Classification

What is CLIP-ViT-bigG-14-laion2B-39B-b160k?

This is an advanced vision-language model based on the CLIP architecture, specifically utilizing a Vision Transformer (ViT) bigG/14 backbone. Trained on the massive LAION-2B English dataset, it represents a significant advancement in zero-shot image classification capabilities. The model was trained by Mitchell Wortsman on the stability.ai cluster, demonstrating impressive performance with 80.1% zero-shot accuracy on ImageNet-1k.

Implementation Details

The model leverages the OpenCLIP framework and is implemented in PyTorch, with additional optimization through Safetensors. It's built upon the ViT-bigG/14 architecture, which is specifically designed for processing and understanding visual information in conjunction with text data.

  • Trained on 2 billion English samples from LAION-5B
  • Fine-tuned on LAION-A (900M subset with aesthetic filtering)
  • Supports zero-shot classification and image-text retrieval
  • Implements state-of-the-art vision transformer architecture

Core Capabilities

  • Zero-shot image classification without additional training
  • Image and text retrieval tasks
  • Support for image classification fine-tuning
  • Linear probe image classification
  • Image generation guidance and conditioning

Frequently Asked Questions

Q: What makes this model unique?

The model's impressive 80.1% zero-shot accuracy on ImageNet-1k sets it apart, along with its training on the carefully curated LAION-2B dataset. The combination of the ViT-bigG architecture with extensive pretraining makes it particularly effective for zero-shot tasks.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, particularly in zero-shot image classification and image-text retrieval tasks. However, it's important to note that deployment in production systems is currently considered out of scope, and the model should be used with appropriate safety considerations.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026