granite-timeseries-patchtsmixer

Maintained By
ibm-granite

PatchTSMixer

PropertyValue
AuthorIBM Granite
PaperTSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting
Training HardwareNVIDIA A100 GPU
FrameworkPyTorch

What is granite-timeseries-patchtsmixer?

PatchTSMixer is a revolutionary lightweight neural architecture designed specifically for multivariate time series forecasting. Pre-trained on the ETTh1 dataset, it achieves impressive performance with an MSE of 0.37 when predicting 96 hours ahead using 512 hours of historical data. The model efficiently handles seven channels: HUFL, HULL, MUFL, MULL, LUFL, LULL, and OT.

Implementation Details

The model implements a novel MLP-Mixer architecture specifically adapted for time series data. It features innovative components including online reconciliation heads, hybrid channel modeling, and a gated attention mechanism. These elements work together to enhance the model's ability to handle noisy channel interactions and generalize across diverse datasets.

  • Lightweight MLP-based architecture optimized for time series
  • Novel online reconciliation heads for time-series properties
  • Hybrid channel modeling approach
  • Gated attention mechanism for feature prioritization
  • Compatible with both supervised and masked self-supervised learning

Core Capabilities

  • Multivariate time series forecasting with state-of-the-art performance
  • Efficient processing of multiple channels with reduced computational requirements
  • 96-hour forecasting using 512-hour historical windows
  • 8-60% performance improvement over existing Transformer models
  • 2-3X reduction in memory and runtime compared to Patch-Transformer models

Frequently Asked Questions

Q: What makes this model unique?

PatchTSMixer stands out for its lightweight architecture that achieves superior performance while using fewer computational resources. It successfully combines MLP-based architecture with innovative components like gated attention and hybrid channel modeling, making it both efficient and effective.

Q: What are the recommended use cases?

The model is ideal for electrical transformer datasets with similar channel structures to ETTh1. It's particularly suited for applications requiring medium-term forecasting (96 hours) based on historical data, especially in scenarios where computational efficiency is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.