CogVideoX1.5-5B-I2V

CogVideoX1.5-5B-I2V

THUDM

Advanced image-to-video generation model capable of creating 16fps videos up to 10 seconds long, supporting high resolutions up to 1360x768 with BF16 precision.

PropertyValue
AuthorTHUDM
LicenseCustom CogVideoX License
PaperarXiv:2408.06072
FrameworkDiffusers

What is CogVideoX1.5-5B-I2V?

CogVideoX1.5-5B-I2V is a sophisticated image-to-video generation model that transforms still images into dynamic videos. It's capable of generating high-quality videos with resolutions up to 1360x768, running at 16 frames per second for durations of 5 or 10 seconds.

Implementation Details

The model operates using BF16 precision (recommended) and requires a minimum of 9GB VRAM for single GPU inference. It supports multiple precision options including FP16, FP32, FP8, and INT8, making it versatile for different hardware configurations.

  • Supports English language prompts up to 224 tokens
  • Flexible resolution support with minimum dimension of 768 pixels
  • Optimized for NVIDIA Ampere architecture and newer GPUs
  • Compatible with various quantization techniques for reduced memory usage

Core Capabilities

  • High-resolution video generation (up to 1360x768)
  • Flexible input image handling
  • Support for long-form video generation (up to 10 seconds)
  • Advanced prompt-based control
  • Memory-efficient operation with various optimization options

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to generate high-resolution videos from still images while maintaining quality and temporal consistency. It offers flexible deployment options and supports various optimization techniques for different hardware configurations.

Q: What are the recommended use cases?

The model is ideal for converting still images into dynamic videos, content creation, visual effects generation, and artistic applications requiring high-quality video output from static images. It's particularly suitable for scenarios requiring detailed control over video generation through text prompts.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026