Open-Sora-Plan v1.3.0

Property	Value
License	MIT
Architecture	3D Attention-based Video Diffusion
Paper	Reference Paper
Tags	Diffusers, Safetensors

What is Open-Sora-Plan-v1.3.0?

Open-Sora-Plan v1.3.0 is an ambitious open-source project aimed at recreating OpenAI's Sora capabilities. This version introduces significant improvements including WFVAE (Waterfall VAE), prompt refiner, and innovative data filtering strategies. The model can generate high-quality videos with dimensions up to 93x480p using just 24GB of VRAM.

Implementation Details

The model implements a sophisticated 3D attention architecture, moving beyond the traditional 2+1D approach. It features a high-performance CausalVideoVAE capable of compressing videos by 256 times (4×8×8) while maintaining quality. The architecture employs sparse attention mechanisms for efficient processing of spatial-temporal information.

3D full attention architecture for better spatiotemporal feature capture
Advanced prompt refining system for improved text control
Bucket training strategy for optimized performance
Support for arbitrary-size video generation within specified constraints

Core Capabilities

Text-to-Video generation with high fidelity
Image-to-Video conversion
Efficient video compression and reconstruction
Support for various resolution outputs (93x480p, 93x720p)
Memory-efficient inference with 24GB VRAM support

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its true 3D video diffusion approach, efficient memory usage, and ability to generate high-quality videos with minimal VRAM requirements. The integration of WFVAE and prompt refiner sets it apart from other video generation models.

Q: What are the recommended use cases?

The model is ideal for video generation from text descriptions or images, creative content creation, and research applications requiring high-quality video synthesis. It's particularly suitable for users with limited GPU resources who still need professional-grade video generation capabilities.