PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations

学习可移植的对象操作策略对于具有身体实体的角色在复杂真实场景下工作是至关重要的。部件作为不同物体类别中的共享组件,有潜力增加操作策略的泛化能力并实现跨类别的对象操作。在本研究中,我们建立了第一个基于部件的大型跨类别对象操作基准,PartManip,由11个物体类别、494个物体和6个任务类别组成。与之前的工作相比,我们的基准更加多样化和现实,即有更多的物体和使用稀疏视角点云作为输入,而不需要像部件分割那样的Oracle信息。为了解决视觉based操作学习的困难,我们首先训练了一个状态based专家,使用我们提出的基于部件的标定和部件 aware奖励,然后将其知识浓缩到一个视觉based的学生中。我们还发现,具有表达性骨架是克服不同物体巨大多样性的关键。为了实现跨类别泛化,我们引入了跨域对抗学习,以提取跨域不变的特征。在模拟中进行广泛的实验表明,我们 learned policy可以在与其他方法相比大幅度领先,特别是在未知的物体类别上。我们还证明,我们的方法可以在现实世界中成功操作新的物体。

Learning a generalizable object manipulation policy is vital for an embodied agent to work in complex real-world scenes. Parts, as the shared components in different object categories, have the potential to increase the generalization ability of the manipulation policy and achieve cross-category object manipulation. In this work, we build the first large-scale, part-based cross-category object manipulation benchmark, PartManip, which is composed of 11 object categories, 494 objects, and 1432 tasks in 6 task classes. Compared to previous work, our benchmark is also more diverse and realistic, i.e., having more objects and using sparse-view point cloud as input without oracle information like part segmentation. To tackle the difficulties of vision-based policy learning, we first train a state-based expert with our proposed part-based canonicalization and part-aware rewards, and then distill the knowledge to a vision-based student. We also find an expressive backbone is essential to overcome the large diversity of different objects. For cross-category generalization, we introduce domain adversarial learning for domain-invariant feature extraction. Extensive experiments in simulation show that our learned policy can outperform other methods by a large margin, especially on unseen object categories. We also demonstrate our method can successfully manipulate novel objects in the real world.


您的电子邮箱地址不会被公开。 必填项已用 * 标注