Abstract:
The few-shot problem is a common phenomenon in machine learning, particularly in experimental science and medical research. Pure data-driven learning relies heavily on the quality and quantity of data. When data is scarce, the model is prone to overfitting and its generalization ability will decrease. However, most fields have accumulated extensive experience and knowledge. A hybrid approach that combines domain knowledge with data can effectively improve model performance. However, in the context of few-shot problems, achieving effective cross-modal feature fusion of knowledge and data is challenging. This study proposes a knowledge and data cross-modal fusion model (KDFM) to address the few-shot problem. First, numerical modal features are categorized into different feature types and modeled using graphs. For each feature type, edges within the graphs are constructed based on K-means clustering. Then, the different types of numerical features are processed through multichannel graph convolution. These graphs convert numerical modal features into graph-level features, enhancing their expressiveness. Subsequently, domain knowledge features from semantic modalities are represented by a knowledge graph. Key entities and relationships are extracted from professional books and expert experiences. The knowledge graph consists of triples formed by combinations of entities and relationships, enabling the transformation of unstructured text features into graph-level features. Textual domain knowledge and experience are organized and converted into the neural network model. A graph convolutional neural network and attention mechanisms are employed for cross-modal feature fusion between knowledge and data. The input of the graph convolutional network includes different graphs constructed from numerical data, feature vectors obtained from the knowledge graph, and numerical vectors from the data. Based on the number of feature types, multichannel graph convolution is applied to achieve deep feature fusion of knowledge and data. The output is a fused multichannel feature vector, computed using the attention mechanism, which serves as the input feature vector for downstream tasks. The proposed model was validated using two small sample datasets: one for a regression task in the materials field and the other for a classification task in the medical field. Simulation results show that, compared with other data-driven models, the proposed KDFM model exhibits excellent performance across various regression and classification tasks. In the regression task, the model achieved the best results in terms of mean squared error, mean absolute error, and
R², with
R² exceeding the suboptimal multilayer perceptron model by over 7%. In the classification task, the model was optimal in five out of seven indicators, with the remaining two indicators being suboptimal. Additionally, multiple ablation experiments were conducted to verify the effectiveness of the proposed model. By removing the modules of the knowledge graph and graph convolutional network from the full model, the study confirmed the effectiveness of both the knowledge modeling and cross-modal fusion mechanism. The proposed model addresses, to some extent, the challenges of weak generalization ability and the integration of knowledge and data modalities in few-shot problems.