Mochi-1 Preview

Property	Value
Parameter Count	10 Billion
Model Type	Text-to-Video Generation
License	Apache 2.0
Architecture	Asymmetric Diffusion Transformer (AsymmDiT)
VRAM Requirements	60GB (Single GPU)

What is mochi-1-preview?

Mochi-1 Preview is a groundbreaking open-source video generation model developed by Genmo. It represents the largest openly released video generative model, featuring a novel Asymmetric Diffusion Transformer architecture. The model excels at producing high-fidelity motion and maintains strong adherence to input prompts, effectively bridging the gap between closed and open video generation systems.

Implementation Details

The model architecture combines an AsymmDiT with 48 layers and 24 attention heads, processing both visual (3072-dim) and text (1536-dim) tokens. It utilizes a single T5-XXL language model for prompt encoding and features an innovative AsymmVAE for efficient video compression at 128x smaller sizes.

Visual Processing: 44,520 tokens with 3072-dimensional representation
Text Processing: 256 tokens with 1536-dimensional representation
Efficient compression with 8x8 spatial and 6x temporal reduction

Core Capabilities

High-quality video generation at 480p resolution
Strong prompt adherence and realistic motion synthesis
Efficient context parallel implementation
Support for both multi-GPU and single-GPU operations
Integration with popular frameworks like Diffusers and ComfyUI

Frequently Asked Questions

Q: What makes this model unique?

The model's asymmetric architecture and massive scale (10B parameters) make it stand out, along with its ability to maintain high-fidelity motion while closely following text prompts. It's also the largest openly released video generation model with a permissive Apache 2.0 license.

Q: What are the recommended use cases?

The model excels at generating photorealistic videos from text descriptions. It's particularly suited for creating high-quality motion content, though it's not optimized for animated or cartoon-style content. Users should be aware of the 480p resolution limitation and potential minor warping in cases of extreme motion.

mochi-1-preview