Pusa-V0.5

Maintained By
RaphaelLiu

Pusa-V0.5

PropertyValue
AuthorRaphaelLiu
Model HubHugging Face
Training Cost$0.1k
Base ModelMochi1-Preview

What is Pusa-V0.5?

Pusa-V0.5 represents a revolutionary approach to video diffusion modeling, introducing frame-level noise control with vectorized timesteps. Built upon the Mochi1-Preview foundation, this early release demonstrates exceptional efficiency in video generation tasks while maintaining high motion fidelity and prompt adherence.

Implementation Details

The model was trained using 16 H800 GPUs with a batch size of 32 over 500 training iterations at a 1e-5 learning rate. It implements a novel non-destructive modification approach that preserves the base model's capabilities while enabling new functionalities through minimal fine-tuning.

  • Trained in just 0.1k GPU hours
  • Utilizes vectorized timesteps for enhanced control
  • Supports multiple video generation tasks
  • 480p video resolution output

Core Capabilities

  • Text-to-Video generation
  • Image-to-Video transformation
  • Frame interpolation
  • Video transitions
  • Seamless looping
  • Extended video generation

Frequently Asked Questions

Q: What makes this model unique?

Pusa-V0.5's uniqueness lies in its frame-level noise control approach and incredible training efficiency, requiring only 0.1k GPU hours and $0.1k in training costs. It also introduces a novel diffusion paradigm that can be applied to other leading video models.

Q: What are the recommended use cases?

The model excels in various video generation tasks, from text-to-video and image-to-video conversion to frame interpolation and seamless loop creation. It's particularly suitable for applications requiring efficient video content generation while maintaining quality.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.