SpatialLM-Llama-1B

Maintained By
manycore-research

SpatialLM-Llama-1B

PropertyValue
Model Type3D Language Model
Base ArchitectureLlama-3.2-1B-Instruct
LicenseLlama3.2 License
AuthorManyCore Research
FrameworkPyTorch

What is SpatialLM-Llama-1B?

SpatialLM-Llama-1B is a groundbreaking 3D large language model designed to bridge the gap between unstructured 3D geometric data and structured scene understanding. Built on the Llama-3.2 architecture, this model specializes in processing point cloud data from various sources including monocular video sequences, RGBD images, and LiDAR sensors.

Implementation Details

The model operates with axis-aligned point clouds where the z-axis serves as the up axis. It leverages advanced techniques to process 3D data and generate comprehensive scene understanding outputs. The implementation requires Python 3.11, PyTorch 2.4.1, and CUDA 12.4, utilizing the TorchSparse framework for efficient point cloud processing.

  • Processes point clouds from multiple input sources
  • Generates structured 3D layout predictions
  • Achieves 78.62% mean IoU for wall detection
  • Supports real-time visualization through Rerun framework

Core Capabilities

  • Architectural element recognition (walls, doors, windows)
  • Object detection and classification with oriented bounding boxes
  • High performance on challenging scenarios (95.24% F1 score for bed detection)
  • Support for both 3D and 2D thin object detection
  • Integration with popular 3D reconstruction tools like MASt3R-SLAM

Frequently Asked Questions

Q: What makes this model unique?

SpatialLM-Llama-1B stands out for its ability to process various types of 3D input data without requiring specialized equipment, making it more accessible and versatile than traditional 3D understanding systems. Its multimodal architecture effectively handles both geometric and semantic understanding tasks.

Q: What are the recommended use cases?

The model is ideal for applications in embodied robotics, autonomous navigation, architectural analysis, and complex 3D scene understanding. It's particularly effective for processing indoor environments where accurate object and structural element detection is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.