rdt-1b

Maintained By
robotics-diffusion-transformer

RDT-1B

PropertyValue
LicenseMIT
PaperarXiv:2410.07864
DeveloperTSAIL group at Tsinghua University
ArchitectureDiffusion Policy with Transformers

What is rdt-1b?

RDT-1B is a groundbreaking 1B-parameter imitation learning Diffusion Transformer designed for robotic control. Pre-trained on over 1 million multi-robot episodes, it represents a significant advancement in vision-language-action modeling for robotics. The model uniquely combines visual input from up to three camera views with natural language instructions to predict precise robot actions.

Implementation Details

The model leverages sophisticated multi-modal encoders including siglip-so400m-patch14-384 for vision processing and t5-v1_1-xxl for language understanding. It can predict 64 consecutive robot actions and supports various control paradigms including single-arm, dual-arm, joint-based, and end-effector-based control.

  • Unified action space supporting multiple robot configurations
  • Multi-view visual processing capability
  • Flexible control frequency adaptation
  • Support for both position and velocity-based control

Core Capabilities

  • Multi-robot episode processing
  • Natural language instruction interpretation
  • Real-time action prediction
  • Wheeled locomotion support
  • Cross-platform compatibility

Frequently Asked Questions

Q: What makes this model unique?

RDT-1B stands out for its ability to handle multiple robot configurations and control paradigms within a single model, combined with its sophisticated vision-language processing capabilities and extensive pre-training on diverse robotics datasets.

Q: What are the recommended use cases?

The model is ideal for robotic manipulation tasks where visual feedback and natural language instructions guide the robot's actions. It's particularly well-suited for scenarios involving mobile manipulators, whether single-arm or dual-arm configurations, and can handle both position and velocity-based control schemes.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.