基于大语言模型和机器学习模型协作的特征筛选管道助力缓蚀剂精准预测

Collaborative feature screen with large language model and machine learning model to enhance corrosion inhibitor prediction

  • 摘要: 从工农业生产到国防科技,材料腐蚀遍及国民经济的各个领域,严重威胁设施装备服役安全,造成了巨大的经济损失,对人类生命健康产生了极大的威胁和隐患。金属缓蚀剂能够改变金属表面状态,使反应的活化能垒增高,从而减缓金属腐蚀速率。缓蚀剂具有低剂量、低成本、高效率等优点,因此成为了应用最广泛的腐蚀抑制手段之一。然而,缓蚀剂的种类多样,作用机制复杂,且与环境因素密切相关。传统的腐蚀研究方法,比如失重测试和电化学测试等通常需要大量的人力物力成本和时间消耗,极大的阻碍了高性能缓蚀剂的设计与应用。需要一种更加高效的j技术手段推动缓蚀剂的研究。近年来,材料基因工程技术的发展引领了腐蚀研究从经验试错向数字化智能化方向变革,利用人工智能技术可实现对现有数据进行分析来预测庞大的未知空间,并探究材料成分/结构与性能的潜在关系。本文基于大语言模型(LLM)和机器学习模型协作的特征筛选管道,借助系统性腐蚀知识注入、提示词设计和递归筛选等技术,从209种特征描述符中筛选得到了13种与饱和CO?环境下缓蚀性能最相关的描述符,这些描述符涉及分子物理化学性质,分子结构性质以及环境参数。筛选前后,模型预测的均方误差由121降低到了11。后续的腐蚀实验验证了模型的预测精度与泛化能力。本文开发的缓蚀剂特征筛选流程与机器学习模型,显著提升了CO?环境下高性能缓蚀剂的研发效率。

     

    Abstract: Corrosion affects all areas of the national economy, from industrial and agricultural production to national defense technology. It seriously threatens the safety of equipment in service, causes huge economic losses, and poses great risks to human life and health. Metal corrosion inhibitors can change the surface state of metals, increase the activation energy barrier of reactions, affect surface electrochemical behavior, and slow down the corrosion rate. These inhibitors have advantages such as low dosage, low cost, and high efficiency, making them one of the most widely used methods for corrosion control. However, there are many types of inhibitors, their mechanisms are complex, and they are closely related to environmental factors. Traditional corrosion research methods, such as weight loss testing and electrochemical testing, usually require a lot of manpower, resources, time, and cost, which greatly hinders the design and application of high-performance inhibitors. There is an urgent need for a more efficient approach to advance inhibitor research. In recent years, the development of materials genome engineering has driven corrosion research from trial-and-error methods toward digital and intelligent approaches. Machine learning can be used to analyze existing data to predict a vast unknown space and explore potential relationships between material composition/structure and performance. This study used a large language model (LLM) assisted feature screening method. Based on data and knowledge about corrosion inhibitors reported in the literature, 13 descriptors most related to corrosion inhibition performance in a saturated carbon dioxide environment were selected from thousands of possible descriptors. These descriptors involve molecular physicochemical properties, molecular structural properties, and environmental parameters. After screening, the mean square error of model predictions dropped from 121 to 11. Follow-up corrosion experiments confirmed the prediction accuracy and generalization ability of the model. The feature screening process and machine learning model developed in this study significantly improved the efficiency of developing high-performance corrosion inhibitors for the target corrosion environment.

     

/

返回文章
返回