Spectral Greedy Coresets for Graph Neural Networks

大规模图在节点分类任务中的普遍性显著阻碍了图形神经网络(GNNs)在现实应用中的发展。节点抽样、图平滑和数据集收缩是提高数据效率的有效策略。然而,由于图节点之间的相互依赖关系,核心集选择,选择数据示例的子集,在大型图中加速GNN训练的效果尚未得到成功应用,需要特殊处理。本文研究了用于GNNs的图形核心集,通过基于其拓扑嵌入选择自顶图(即节点周围的子图)来避免相互依赖问题。我们将GNNs的核心集选择问题分解为两个阶段:广泛传播的节点子图的粗选和精细选择以丰富它们的拓扑结构。我们设计了一个贪心算法,近似优化这两个目标。我们的拓扑贪心图形核心集(SGGC)扩展到具有数百万个节点的图形,无需进行模型预训练,并且适用于低同质性图形。在十个数据集上的大量实验证明,SGGC在其他核心集方法中具有明显的优势,具有良好的泛化能力,并且比图形收缩方法要快得多。

The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfully applied to speed up GNN training on large graphs, warranting special treatment. This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs (i.e., neighborhood subgraphs around a node) based on their spectral embeddings. We decompose the coreset selection problem for GNNs into two phases: a coarse selection of widely spread ego graphs and a refined selection to diversify their topologies. We design a greedy algorithm that approximately optimizes both objectives. Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs. Extensive experiments on ten datasets demonstrate that SGGC outperforms other coreset methods by a wide margin, generalizes well across GNN architectures, and is much faster than graph condensation.

https://arxiv.org/abs/2405.17404

https://arxiv.org/pdf/2405.17404.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注