Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning

特征变换是从原始特征中提取新特征集以增强数据的人工智能能力。在许多科学领域,如材料性能筛选,特征变换可以建模材料公式之间的相互作用和组合,并发现性能驱动因素。然而,在收集数据和进行昂贵实验的过程中,需要收集有监督标签。这个问题激起了无监督特征转换学习(UFTL)问题。先前的文献,如手动转换、有监督反馈引导搜索和主成分分析(PCA),要么依赖于领域知识,要么受到大的搜索空间的影响,或者忽视了非线性特征-特征交互。UFTL对现有方法提出了一个重大挑战:如何设计一个新的无监督范例,以捕捉复杂的特征交互,并避免大的搜索空间?为了填补这个空白,我们连接图、对比学习和生成学习,为UFTL开发了一个测量预训练-微调范式。对于无监督特征集的利用率测量,我们提出了一个特征值一致性保留观点,并开发了一个类似于无监督指标的累积衰减累计收益(CDF)来评估特征集的利用率。对于无监督特征集表示预训练,我们将特征集视为特征-特征交互图,并开发了一个无监督图对比学习编码器,将特征集嵌入向量中。对于生成转换微调,我们将特征集视为特征跨序列,将特征转换视为序列生成。我们开发了一个深度生成特征转换模型,协调预训练特征集编码器和解剖信息,以优化转换后的特征生成器。

Feature transformation is to derive a new feature set from original features to augment the AI power of data. In many science domains such as material performance screening, while feature transformation can model material formula interactions and compositions and discover performance drivers, supervised labels are collected from expensive and lengthy experiments. This issue motivates an Unsupervised Feature Transformation Learning (UFTL) problem. Prior literature, such as manual transformation, supervised feedback guided search, and PCA, either relies on domain knowledge or expensive supervised feedback, or suffers from large search space, or overlooks non-linear feature-feature interactions. UFTL imposes a major challenge on existing methods: how to design a new unsupervised paradigm that captures complex feature interactions and avoids large search space? To fill this gap, we connect graph, contrastive, and generative learning to develop a measurement-pretrain-finetune paradigm for UFTL. For unsupervised feature set utility measurement, we propose a feature value consistency preservation perspective and develop a mean discounted cumulative gain like unsupervised metric to evaluate feature set utility. For unsupervised feature set representation pretraining, we regard a feature set as a feature-feature interaction graph, and develop an unsupervised graph contrastive learning encoder to embed feature sets into vectors. For generative transformation finetuning, we regard a feature set as a feature cross sequence and feature transformation as sequential generation. We develop a deep generative feature transformation model that coordinates the pretrained feature set encoder and the gradient information extracted from a feature set utility evaluator to optimize a transformed feature generator.

https://arxiv.org/abs/2405.16879

https://arxiv.org/pdf/2405.16879.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注