SpatialLM-Qwen-0.5B

Property	Value
Model Type	3D Language Model
Parameters	0.5B
License	Apache 2.0
Author	manycore-research

What is SpatialLM-Qwen-0.5B?

SpatialLM-Qwen-0.5B is an innovative 3D large language model designed to process and understand point cloud data, providing structured 3D scene understanding. Built on the Qwen-2.5 architecture, this model specializes in analyzing architectural elements and object detection within 3D spaces.

Implementation Details

The model implements a sophisticated architecture capable of processing point clouds from various sources including monocular video sequences, RGBD images, and LiDAR sensors. It requires Python 3.11, PyTorch 2.4.1, and CUDA 12.4 for optimal performance.

Axis-aligned input processing with z-axis as up orientation
Integration with MASt3R-SLAM for RGB video reconstruction
Specialized point cloud encoder using SceneScript
Support for multiple input sources and formats

Core Capabilities

3D Layout Recognition: 74.81% IoU for wall detection
Object Detection: High accuracy for furniture (93.75% F1 for beds, 66.15% F1 for sofas)
Thin Object Recognition: Effective detection of paintings (53.81% F1) and windows (45.9% F1)
Spatial Understanding: Comprehensive scene analysis and object relationship mapping

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to process multiple input types and generate structured 3D understanding without specialized equipment sets it apart. It bridges the gap between unstructured 3D geometric data and semantic understanding.

Q: What are the recommended use cases?

The model is ideal for embodied robotics, autonomous navigation, architectural analysis, and complex 3D scene understanding tasks. It's particularly effective for processing point clouds from various sources and generating detailed spatial understanding.