EgoTV: Egocentric Task Verification from Natural Language Task Descriptions

为了实现能够理解自然语言中指定的日常任务的主观代理,我们提出了一个基准和一个合成数据集,名为主观任务验证(Egocentric Task Verification, EgoTV)。EgoTV包含了多个子任务分解的多个步骤任务、状态变化、对象交互和子任务排序约束,此外它还抽象了任务描述,其中仅包含完成任务的一些 partial 细节。我们还提出了一种独特的神经符号grounding(NSG)方法,以启用这种任务的因果、时间和组成性推理。我们在EgoTV数据集和从CrossTask(CTV)生成的现实世界数据集上展示了NSG的能力,以进行任务跟踪和验证。我们的贡献包括发布了EgoTV和CTV数据集,以及为主观辅助代理的未来研究提供的NSG模型。

To enable progress towards egocentric agents capable of understanding everyday tasks specified in natural language, we propose a benchmark and a synthetic dataset called Egocentric Task Verification (EgoTV). EgoTV contains multi-step tasks with multiple sub-task decompositions, state changes, object interactions, and sub-task ordering constraints, in addition to abstracted task descriptions that contain only partial details about ways to accomplish a task. We also propose a novel Neuro-Symbolic Grounding (NSG) approach to enable the causal, temporal, and compositional reasoning of such tasks. We demonstrate NSG’s capability towards task tracking and verification on our EgoTV dataset and a real-world dataset derived from CrossTask (CTV). Our contributions include the release of the EgoTV and CTV datasets, and the NSG model for future research on egocentric assistive agents.

https://arxiv.org/abs/2303.16975

https://arxiv.org/pdf/2303.16975

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注