CLIP-GmP-ViT-L-14

Maintained By
zer0int

CLIP-GmP-ViT-L-14

PropertyValue
Parameter Count428M
LicenseMIT
Base Modelopenai/clip-vit-large-patch14
Tensor TypeF32

What is CLIP-GmP-ViT-L-14?

CLIP-GmP-ViT-L-14 is an advanced fine-tuned version of OpenAI's CLIP ViT-L/14 model that implements Geometric Parametrization (GmP) to achieve superior performance in image classification tasks. The model notably achieves ~0.91 accuracy on ImageNet/ObjectNet compared to the original model's ~0.84.

Implementation Details

The model employs a unique Geometric Parametrization approach that decomposes weights into radial and angular components, preserving weight vectors' directionality and magnitude. It offers multiple versions including text encoder-only safetensors and full model implementations.

  • Implements Geometric Parametrization for improved performance
  • Features custom loss function with label smoothing
  • Maintains a modality gap of 0.80 (compared to OpenAI pre-trained 0.82)
  • Available in multiple formats including text encoder-only and full model versions

Core Capabilities

  • Superior text prompt following and detail generation
  • Enhanced image classification accuracy
  • Seamless integration with Hugging Face Transformers/Diffusers pipeline
  • Compatible with various text-to-image models including Flux.1, SD3, SDXL

Frequently Asked Questions

Q: What makes this model unique?

The model's unique Geometric Parametrization approach and custom loss function with label smoothing enable significantly improved accuracy in image classification tasks while maintaining strong text-following capabilities.

Q: What are the recommended use cases?

The model is particularly well-suited for text-to-image generation tasks, zero-shot image classification, and as a text encoder replacement in various stable diffusion models. Different versions are optimized for specific use cases, with the "TEXT" model excelling in text-heavy scenarios and the "SMOOTH" model potentially better for text-free applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.