CogVideoX-Fun-V1.1-Reward-LoRAs

Property	Value
Paper	Aligning text-to-image diffusion models with reward backpropagation
Author	alibaba-pai
Model Architecture	LoRA-based optimization for CogVideoX

What is CogVideoX-Fun-V1.1-Reward-LoRAs?

CogVideoX-Fun-V1.1-Reward-LoRAs is an innovative implementation that enhances video generation through reward backpropagation techniques. It provides pre-trained LoRA models designed to optimize the output of CogVideoX-Fun-V1.1 base models for better alignment with human preferences.

Implementation Details

The model offers multiple LoRA variants, including versions for both 2B and 5B parameter base models, trained with HPS v2.1 and MPS reward models. Each LoRA uses rank=128 and network_alpha=64, with varying training steps optimized for different model sizes.

5B model LoRAs trained with batch size 8 for 1,500-5,500 steps
2B model LoRAs trained with batch size 8 for 3,000-16,000 steps
Supports both HPSv2.1 and MPS reward models for optimization

Core Capabilities

Enhanced video generation quality through reward-based optimization
Plug-and-play compatibility with CogVideoX-Fun base models
Improved alignment with human preferences in video generation
Support for various prompt types including dynamic scenes and complex animations

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely applies reward backpropagation to video generation, offering a specialized solution for enhancing video quality through human preference alignment. It provides ready-to-use LoRA weights that can be easily integrated with existing CogVideoX models.

Q: What are the recommended use cases?

The model is ideal for generating high-quality videos from text prompts, particularly when human preference alignment is crucial. It excels in creating dynamic scenes, animated characters, and complex visual narratives with improved coherence and quality.