CLIP-GmP-ViT-L-14

Property	Value
Parameter Count	428M
License	MIT
Base Model	openai/clip-vit-large-patch14
Tensor Type	F32

What is CLIP-GmP-ViT-L-14?

CLIP-GmP-ViT-L-14 is an advanced fine-tuned version of OpenAI's CLIP ViT-L/14 model that implements Geometric Parametrization (GmP) to achieve superior performance in image classification tasks. The model notably achieves ~0.91 accuracy on ImageNet/ObjectNet compared to the original model's ~0.84.

Implementation Details

The model employs a unique Geometric Parametrization approach that decomposes weights into radial and angular components, preserving weight vectors' directionality and magnitude. It offers multiple versions including text encoder-only safetensors and full model implementations.

Implements Geometric Parametrization for improved performance
Features custom loss function with label smoothing
Maintains a modality gap of 0.80 (compared to OpenAI pre-trained 0.82)
Available in multiple formats including text encoder-only and full model versions

Core Capabilities

Superior text prompt following and detail generation
Enhanced image classification accuracy
Seamless integration with Hugging Face Transformers/Diffusers pipeline
Compatible with various text-to-image models including Flux.1, SD3, SDXL

Frequently Asked Questions

Q: What makes this model unique?

The model's unique Geometric Parametrization approach and custom loss function with label smoothing enable significantly improved accuracy in image classification tasks while maintaining strong text-following capabilities.

Q: What are the recommended use cases?

The model is particularly well-suited for text-to-image generation tasks, zero-shot image classification, and as a text encoder replacement in various stable diffusion models. Different versions are optimized for specific use cases, with the "TEXT" model excelling in text-heavy scenarios and the "SMOOTH" model potentially better for text-free applications.

CLIP-GmP-ViT-L-14

CLIP-GmP-ViT-L-14

What is CLIP-GmP-ViT-L-14?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models