Wan2.1-T2V-1.3B-Diffusers

Maintained By
Wan-AI

Wan2.1-T2V-1.3B-Diffusers

PropertyValue
Parameter Count1.3B
Model TypeText-to-Video Diffusion
LicenseApache 2.0
GPU Memory Required8.19GB VRAM
Supported Resolution480P (Optimal)

What is Wan2.1-T2V-1.3B-Diffusers?

Wan2.1-T2V-1.3B-Diffusers is a groundbreaking text-to-video generation model that combines efficiency with powerful capabilities. It's designed to run on consumer-grade GPUs while delivering high-quality video outputs comparable to some closed-source solutions. The model can generate a 5-second 480P video in approximately 4 minutes on an RTX 4090.

Implementation Details

The model utilizes a Flow Matching framework within the Diffusion Transformer paradigm, featuring a dimension of 1536, 30 layers, and 12 attention heads. It employs a T5 Encoder for multilingual text processing and includes a novel spatio-temporal variational autoencoder (Wan-VAE) for efficient video processing.

  • Dimension: 1536
  • Input/Output Dimension: 16
  • Feedforward Dimension: 8960
  • Number of Layers: 30
  • Number of Heads: 12

Core Capabilities

  • Text-to-Video generation with optimal 480P resolution
  • Multilingual text generation support (Chinese and English)
  • Efficient video processing with minimal VRAM requirements
  • Support for prompt extension through Dashscope API or local models
  • Compatible with Diffusers pipeline and various inference methods

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to run on consumer GPUs while maintaining high-quality output sets it apart. It requires only 8.19GB VRAM, making it accessible to most users while delivering performance comparable to larger models.

Q: What are the recommended use cases?

The model excels in generating short-form videos from text descriptions, particularly at 480P resolution. It's ideal for creative teams needing quick video generation capabilities without extensive computational resources.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.