LongCLIP-GmP-ViT-L-14

LongCLIP-GmP-ViT-L-14

zer0int

Enhanced CLIP model with 428M params supporting 248 tokens (vs standard 77). Features geometric parametrization for improved accuracy and longer text processing.

PropertyValue
Parameter Count428M
Model TypeCLIP
ArchitectureVision Transformer Large/14
LicenseMIT
Tensor TypeF32

What is LongCLIP-GmP-ViT-L-14?

LongCLIP-GmP-ViT-L-14 is an advanced fine-tuned version of the original Long-CLIP model, incorporating Geometric Parametrization (GmP) to enhance performance. This model extends CLIP's capabilities by supporting longer text sequences of up to 248 tokens, compared to the standard 77 tokens, while achieving an improved ImageNet/ObjectNet accuracy of 0.89.

Implementation Details

The model implements a sophisticated weight decomposition strategy using geometric parametrization, which preserves weight vectors' directionality and magnitude through radial and angular components. This approach has proven particularly effective for maintaining model stability during fine-tuning.

  • Supports 248 token sequences
  • Implements Geometric Linear layers
  • Compatible with Flux.1, SDXL, and Stable Diffusion
  • Includes custom loss with label smoothing

Core Capabilities

  • Enhanced zero-shot image classification
  • Improved text-image matching accuracy
  • Superior performance on longer text sequences
  • Better cosine similarities for image-text pairs

Frequently Asked Questions

Q: What makes this model unique?

The model's unique feature is its combination of extended token length support (248 tokens) with geometric parametrization, resulting in significantly improved accuracy while maintaining stability in fine-tuning scenarios.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring longer text descriptions in image-text matching tasks, zero-shot image classification, and as a text encoder for various stable diffusion models including SDXL and Flux.1.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026