SpatialLM-Qwen-0.5B
Property | Value |
---|---|
Model Type | 3D Language Model |
Parameters | 0.5B |
License | Apache 2.0 |
Author | manycore-research |
What is SpatialLM-Qwen-0.5B?
SpatialLM-Qwen-0.5B is an innovative 3D large language model designed to process and understand point cloud data, providing structured 3D scene understanding. Built on the Qwen-2.5 architecture, this model specializes in analyzing architectural elements and object detection within 3D spaces.
Implementation Details
The model implements a sophisticated architecture capable of processing point clouds from various sources including monocular video sequences, RGBD images, and LiDAR sensors. It requires Python 3.11, PyTorch 2.4.1, and CUDA 12.4 for optimal performance.
- Axis-aligned input processing with z-axis as up orientation
- Integration with MASt3R-SLAM for RGB video reconstruction
- Specialized point cloud encoder using SceneScript
- Support for multiple input sources and formats
Core Capabilities
- 3D Layout Recognition: 74.81% IoU for wall detection
- Object Detection: High accuracy for furniture (93.75% F1 for beds, 66.15% F1 for sofas)
- Thin Object Recognition: Effective detection of paintings (53.81% F1) and windows (45.9% F1)
- Spatial Understanding: Comprehensive scene analysis and object relationship mapping
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to process multiple input types and generate structured 3D understanding without specialized equipment sets it apart. It bridges the gap between unstructured 3D geometric data and semantic understanding.
Q: What are the recommended use cases?
The model is ideal for embodied robotics, autonomous navigation, architectural analysis, and complex 3D scene understanding tasks. It's particularly effective for processing point clouds from various sources and generating detailed spatial understanding.