CLIP-GmP-ViT-L-14

CLIP-GmP-ViT-L-14

zer0int

A fine-tuned CLIP model with 428M parameters, featuring Geometric Parametrization for improved ImageNet/ObjectNet accuracy (~0.91 vs original 0.84)

PropertyValue
Parameter Count428M
LicenseMIT
Base Modelopenai/clip-vit-large-patch14
Tensor TypeF32

What is CLIP-GmP-ViT-L-14?

CLIP-GmP-ViT-L-14 is an advanced fine-tuned version of OpenAI's CLIP ViT-L/14 model that implements Geometric Parametrization (GmP) to achieve superior performance in image classification tasks. The model notably achieves ~0.91 accuracy on ImageNet/ObjectNet compared to the original model's ~0.84.

Implementation Details

The model employs a unique Geometric Parametrization approach that decomposes weights into radial and angular components, preserving weight vectors' directionality and magnitude. It offers multiple versions including text encoder-only safetensors and full model implementations.

  • Implements Geometric Parametrization for improved performance
  • Features custom loss function with label smoothing
  • Maintains a modality gap of 0.80 (compared to OpenAI pre-trained 0.82)
  • Available in multiple formats including text encoder-only and full model versions

Core Capabilities

  • Superior text prompt following and detail generation
  • Enhanced image classification accuracy
  • Seamless integration with Hugging Face Transformers/Diffusers pipeline
  • Compatible with various text-to-image models including Flux.1, SD3, SDXL

Frequently Asked Questions

Q: What makes this model unique?

The model's unique Geometric Parametrization approach and custom loss function with label smoothing enable significantly improved accuracy in image classification tasks while maintaining strong text-following capabilities.

Q: What are the recommended use cases?

The model is particularly well-suited for text-to-image generation tasks, zero-shot image classification, and as a text encoder replacement in various stable diffusion models. Different versions are optimized for specific use cases, with the "TEXT" model excelling in text-heavy scenarios and the "SMOOTH" model potentially better for text-free applications.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026