Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

近年来在多模态大型语言模型(LLMs)方面的进步在各种领域特别是概念推理方面已经得到了展示。然而,在理解3D环境中的应用仍然有限。本文介绍了一种名为Reason3D的新颖LLM,用于全面的3D理解。Reason3D接受点云数据和文本提示作为输入,产生文本响应和分割掩码,从而促进先进任务比如3D推理分割、层次搜索、表达性引用和详细掩码输出。具体来说,我们提出了一个分层的掩码解码器,用于在广阔的场景中定位小物体。这个解码器首先生成一个粗略的位置估计,覆盖物体的总体区域。这个基本的估计促使了详细到粗略的分割策略,显著提高了物体识别和分割的精度。实验证实,Reason3D在大型ScanNet和Matterport3D数据集上对于3D表达性引用、3D问题和3D推理分割任务都取得了显著的成果。代码和模型可在此处访问:https://this URL。

Recent advancements in multimodal large language models (LLMs) have shown their potential in various domains, especially concept reasoning. Despite these developments, applications in understanding 3D environments remain limited. This paper introduces Reason3D, a novel LLM designed for comprehensive 3D understanding. Reason3D takes point cloud data and text prompts as input to produce textual responses and segmentation masks, facilitating advanced tasks like 3D reasoning segmentation, hierarchical searching, express referring, and question answering with detailed mask outputs. Specifically, we propose a hierarchical mask decoder to locate small objects within expansive scenes. This decoder initially generates a coarse location estimate covering the object’s general area. This foundational estimation facilitates a detailed, coarse-to-fine segmentation strategy that significantly enhances the precision of object identification and segmentation. Experiments validate that Reason3D achieves remarkable results on large-scale ScanNet and Matterport3D datasets for 3D express referring, 3D question answering, and 3D reasoning segmentation tasks. Code and models are available at: this https URL.

https://arxiv.org/abs/2405.17427

https://arxiv.org/pdf/2405.17427.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注