WHAM (World and Human Action Model)
Property | Value |
---|---|
Developer | Microsoft Research |
Parameters | 200M and 1.6B versions |
Training Data | 500,000 Bleeding Edge games |
License | Microsoft Research License |
Architecture | Decoder-only transformer with VQ-GAN |
What is WHAM?
WHAM is an advanced generative AI model developed by Microsoft Research's Game Intelligence group in collaboration with TaiX and Ninja Theory. It's specifically designed to generate gameplay sequences, combining both visual elements and controller actions from the game Bleeding Edge. The model can process and generate consistent game sequences while maintaining an understanding of 3D environment structure and temporal gameplay elements.
Implementation Details
The model architecture consists of two main components: an encoder-decoder VQ-GAN for handling game visuals and a transformer backbone for next-token prediction. It was trained on approximately 500,000 Bleeding Edge games, equivalent to over 7 years of continuous human gameplay, using 98 H100 GPUs over 5 days.
- Context length: 10 observation-action pairs (5560 tokens)
- Image resolution: 300px x 180px
- Training data: 1 billion observation-action pairs at 10Hz
- Available versions: 200M parameters (3.7GB) and 1.6B parameters (18.9GB)
Core Capabilities
- World Modeling: Predicts visuals based on starting state and action sequence
- Behavior Policy: Generates controller actions based on visual input
- Full Generation: Creates both visuals and controller actions simultaneously
- Consistent and persistent game sequence generation
Frequently Asked Questions
Q: What makes this model unique?
WHAM is unique in its ability to generate both visual and controller action sequences for gameplay, maintaining consistency and physical accuracy within the game environment. It's one of the first models to demonstrate effective world modeling for complex 3D game environments.
Q: What are the recommended use cases?
The model is specifically designed for academic research purposes and game development exploration. It's particularly useful for studying gameplay patterns, testing game scenarios, and creative iteration in game development within the context of Bleeding Edge.