siglip-large-patch16-384

siglip-large-patch16-384

google

A large-scale vision-language model (652M params) using sigmoid loss for improved image-text understanding. Excels at zero-shot classification with 384x384 resolution.

PropertyValue
Parameter Count652M
LicenseApache 2.0
Training DataWebLI Dataset
Resolution384x384
PaperSigmoid Loss for Language Image Pre-Training

What is siglip-large-patch16-384?

SigLIP is an advanced vision-language model that builds upon CLIP's architecture while introducing a revolutionary sigmoid loss function. This large variant, trained on 384x384 resolution images, represents a significant advancement in multimodal AI, particularly excelling at zero-shot image classification tasks.

Implementation Details

The model was trained on WebLI dataset using 16 TPU-v4 chips over three days. It processes images by resizing them to 384x384 resolution and normalizing them across RGB channels (mean: 0.5, std: 0.5). Text inputs are tokenized and padded to 64 tokens.

  • Improved loss function that doesn't require global similarity normalization
  • Supports larger batch sizes while maintaining performance
  • Processes both image and text inputs for multimodal understanding

Core Capabilities

  • Zero-shot image classification
  • Image-text retrieval
  • Multimodal understanding with high accuracy
  • Efficient processing of high-resolution images

Frequently Asked Questions

Q: What makes this model unique?

SigLIP's key innovation lies in its sigmoid loss function, which operates directly on image-text pairs without requiring global normalization. This allows for better scaling and improved performance even with smaller batch sizes compared to traditional CLIP models.

Q: What are the recommended use cases?

The model excels at zero-shot image classification and image-text retrieval tasks. It's particularly useful for applications requiring high-resolution image understanding (384x384) and flexible deployment scenarios where batch size optimization is crucial.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026