Ruyi-Mini-7B

Property	Value
Parameter Count	7.1 Billion
Model Type	Image-to-Video Generation
License	Apache 2.0
Author	IamCreateAI
Model URL	https://huggingface.co/IamCreateAI/Ruyi-Mini-7B

What is Ruyi-Mini-7B?

Ruyi-Mini-7B is an advanced open-source image-to-video generation model that transforms static images into dynamic videos. Built with approximately 7.1 billion parameters, it supports video generation at resolutions from 360p to 720p, with various aspect ratios and durations up to 5 seconds. The model incorporates sophisticated motion and camera control features, offering creators enhanced flexibility in video generation.

Implementation Details

The model architecture consists of three primary components: a Casual VAE Module for video compression, a Diffusion Transformer Module with 3D full attention, and a CLIP model for semantic feature extraction. The training process involved four intensive phases, including pre-training with 200M video clips, multi-scale resolution fine-tuning, and specialized image-to-video training.

Casual VAE Module reduces spatial resolution to 1/8 and temporal resolution to 1/4
2D Normalized-RoPE for spatial dimensions
Sin-cos position embedding for temporal dimensions
DDPM for model training
CLIP-guided video generation through cross-attention

Core Capabilities

Supports resolutions from 360p to 720p
Maximum video duration of 5 seconds
Multiple aspect ratio support
Motion and camera control features
Various VRAM configurations (21.5GB-54.8GB)

Frequently Asked Questions

Q: What makes this model unique?

Ruyi-Mini-7B stands out for its comprehensive training approach across multiple phases and its ability to handle various video resolutions while maintaining quality. The inclusion of motion and camera control features provides creators with unprecedented control over video generation.

Q: What are the recommended use cases?

The model is ideal for creating short-form video content from static images, particularly useful in creative applications, content creation, and prototyping. However, users should note limitations with text rendering, hand representations, and crowded human faces.

Q: What are the hardware requirements?

Requirements vary by video size and resolution. For example, 360x480 videos need 21.5GB VRAM, while 720x1280 videos require 54.8GB. A low memory mode is available for 24GB VRAM cards like RTX4090.

Ruyi-Mini-7B

Ruyi-Mini-7B

What is Ruyi-Mini-7B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Q: What are the hardware requirements?

Related Models