面向少样本的知识与数据的跨模态特征融合模型

Integrating knowledge and data: a cross-modal feature fusion model for few-shot problems

  • 摘要: 在少样本学习中,纯数据驱动式学习容易出现过拟合和泛化能力下降等问题,而知识与数据的融合不足也会导致模型性能受限. 因此,本文针对少样本学习问题,提出了一种深度跨模态特征融合模型(KDFM)以融合领域知识和结构化数据特征,进而提升下游任务的性能. KDFM采用多特征交互框架:首先,基于知识图谱建模领域中用语义表达的知识的模态,利用TransE算法提取知识节点的嵌入表示;其次,将结构化的数据模态映射为图网络,通过多通道图卷积网络捕捉特征间的高阶关联;最后,设计注意力机制动态对齐知识嵌入与数据特征,实现跨模态信息的自适应融合. 本文将所提模型分别在材料回归和医学分类两个少样本数据集上进行了验证. 相比其他纯数据驱动的模型,所提模型在各项回归和分类任务上均取得了较好的结果. 消融实验结果表明了所提模型的知识建模部分和跨模态融合部分的有效性. 这也说明KDFM通过多特征协同建模与高效融合策略,在一定程度上解决了少样本下模型泛化能力弱,知识与数据模态融合困难的问题.

     

    Abstract: The few-shot problem is a common phenomenon in machine learning, particularly in experimental science and medical research. Pure data-driven learning relies heavily on the quality and quantity of data. When data is scarce, the model is prone to overfitting and its generalization ability will decrease. However, most fields have accumulated extensive experience and knowledge. A hybrid approach that combines domain knowledge with data can effectively improve model performance. However, in the context of few-shot problems, achieving effective cross-modal feature fusion of knowledge and data is challenging. This study proposes a knowledge and data cross-modal fusion model (KDFM) to address the few-shot problem. First, numerical modal features are categorized into different feature types and modeled using graphs. For each feature type, edges within the graphs are constructed based on K-means clustering. Then, the different types of numerical features are processed through multichannel graph convolution. These graphs convert numerical modal features into graph-level features, enhancing their expressiveness. Subsequently, domain knowledge features from semantic modalities are represented by a knowledge graph. Key entities and relationships are extracted from professional books and expert experiences. The knowledge graph consists of triples formed by combinations of entities and relationships, enabling the transformation of unstructured text features into graph-level features. Textual domain knowledge and experience are organized and converted into the neural network model. A graph convolutional neural network and attention mechanisms are employed for cross-modal feature fusion between knowledge and data. The input of the graph convolutional network includes different graphs constructed from numerical data, feature vectors obtained from the knowledge graph, and numerical vectors from the data. Based on the number of feature types, multichannel graph convolution is applied to achieve deep feature fusion of knowledge and data. The output is a fused multichannel feature vector, computed using the attention mechanism, which serves as the input feature vector for downstream tasks. The proposed model was validated using two small sample datasets: one for a regression task in the materials field and the other for a classification task in the medical field. Simulation results show that, compared with other data-driven models, the proposed KDFM model exhibits excellent performance across various regression and classification tasks. In the regression task, the model achieved the best results in terms of mean squared error, mean absolute error, and R², with R² exceeding the suboptimal multilayer perceptron model by over 7%. In the classification task, the model was optimal in five out of seven indicators, with the remaining two indicators being suboptimal. Additionally, multiple ablation experiments were conducted to verify the effectiveness of the proposed model. By removing the modules of the knowledge graph and graph convolutional network from the full model, the study confirmed the effectiveness of both the knowledge modeling and cross-modal fusion mechanism. The proposed model addresses, to some extent, the challenges of weak generalization ability and the integration of knowledge and data modalities in few-shot problems.

     

/

返回文章
返回