Entity Alignment with Noisy Annotations from Large Language Models

实体对齐(EA)的目的是通过确定等价的实体对来合并两个知识图(KG)。然而,现有的方法在很大程度上依赖于人类生成的标签,因此在现实场景中,纳入跨领域专家进行标注成本太高。大型语言模型的出现为使用注释自动化EA提供了新的途径,受到它们全面处理语义信息的能力的启发。然而,直接将LLM应用于EA也是具有挑战性的,因为现实世界的KG中存在大量的标注空间。LLM还可能生成噪声标签,可能会误导对齐。为此,我们提出了一个统一框架LLM4EA,以有效地利用LLM进行EA。具体来说,我们设计了一种新颖的主动学习策略,通过根据整个KG和内部KG结构的价值来优先选择最有价值的实体,显著减少了注释空间。此外,我们引入了一种无监督的标签优化器,通过深入的概率推理连续增强标签准确性。我们根据基于EA模型的反馈迭代优化策略。大量实验证明,LLM4EA在四个基准数据集上的有效性、稳健性和效率都具有优势。

Entity alignment (EA) aims to merge two knowledge graphs (KGs) by identifying equivalent entity pairs. While existing methods heavily rely on human-generated labels, it is prohibitively expensive to incorporate cross-domain experts for annotation in real-world scenarios. The advent of Large Language Models (LLMs) presents new avenues for automating EA with annotations, inspired by their comprehensive capability to process semantic information. However, it is nontrivial to directly apply LLMs for EA since the annotation space in real-world KGs is large. LLMs could also generate noisy labels that may mislead the alignment. To this end, we propose a unified framework, LLM4EA, to effectively leverage LLMs for EA. Specifically, we design a novel active learning policy to significantly reduce the annotation space by prioritizing the most valuable entities based on the entire inter-KG and intra-KG structure. Moreover, we introduce an unsupervised label refiner to continuously enhance label accuracy through in-depth probabilistic reasoning. We iteratively optimize the policy based on the feedback from a base EA model. Extensive experiments demonstrate the advantages of LLM4EA on four benchmark datasets in terms of effectiveness, robustness, and efficiency.

https://arxiv.org/abs/2405.16806

https://arxiv.org/pdf/2405.16806.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注