基于特征工程与机器学习的8620钢淬透性高效预测

王斌斌; 朱德鑫; 武森; 李福勇; 黄胜永; 吴宏辉

doi:10.13374/j.issn2095-9389.2025.02.24.003

摘要: 8620钢是重要的机械用钢，常用于传递较大动力、承受较大载荷的齿轮，应用于运输、起重、机车牵引及风力发电等重要领域. 淬透性是衡量钢铁材料在热处理过程中硬度分布均匀性的重要指标，直接影响材料的力学性能和使用寿命，在齿轮钢的生产和应用中尤为重要，传统的淬透性评估方法主要依赖于Jominy端淬试验，由于试验流程复杂、耗时，存在工作量大、成本高等问题. 本研究基于8620钢产线数据，结合机器学习与特征工程技术，采用SHAP方法以及最优子集法筛选关键特征变量，使用7种不同的机器学习算法构建淬透性预测模型，结合十折交叉验证方法系统评估模型性能. 对比分析发现，XGBoost模型在原始特征集上的表现最佳（决定系数R²=0.894，均方根误差RMSE=0.820 HRC，±2 HRC内的命中率为94.19%），经特征筛选后RF模型仍保持较高精度（R²=0.866，RMSE=0.928 HRC，±2 HRC内的命中率为93.66%），同时计算效率提高33%，实现8620钢Jominy端淬试样淬火端7.9 mm处硬度值（J7.9值）的低维高精度预测，为8620钢淬透性的预测和优化提供科学依据.

Abstract: 8620 steel is a critical alloy widely used in the manufacturing of gears designed to transmit substantial power and endure heavy loads. Its applications span vital sectors, including transportation, lifting equipment, locomotive traction, and wind power generation. Hardenability, a key measure of the uniformity hardness distribution during heat treatment, significantly influences the mechanical properties and service life of gear steel components. Traditional hardenability assessment relies on the Jominy end-quench test, which is labor-intensive, time-consuming, and costly. Although conventional empirical models provide a practical approach to hardenability prediction, they struggle to capture complex multivariate relationships. This study used 834 production line data, covering 19 chemical compositions and J7.9 values, to obtain 772 valid samples through systematic data preprocessing (including missing value processing and outlier removal based on two times the interquartile range). In the feature engineering stage, Pearson correlation analysis was used to reveal the correlation between the chemical components of 8620 steel. The average linkage hierarchical clustering method was further used to divide the 19 elements into six feature clusters. The SHAP method was used to reveal the key influence of elements such as chromium (Cr), carbon (C), molybdenum (Mo), manganese (Mn), cerium (Ce), tungsten (W), aluminum, and silicon on hardenability. Among them, Cr, C, and Mo showed significant positive contributions, while nitrogen showed a negative effect. Combined with knowledge of materials science, six important features of Cr, C, Mo, Mn, Ce, and W were preliminarily screened out. The four features of C, Cr, Mo, and Ce were further screened out by the optimal subset method to construct a low-dimensional and high-precision model. Seven machine learning algorithms: linear regression (LR), K-nearest neighbor (KNN), decision tree (DT), random forest (RF), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost) and lightweight gradient boosting machine (LightGBM), were used to construct hardenability machine learning models. Ten-fold cross validation was used to ensure the generalization ability of the model. The determination coefficient (R²), root mean square error (RMSE), and deviation ratio between the model prediction value and experimental value within ±2 HRC of the hardenability bandwidth (hit rate within ±2 HRC) were introduced to measure the performance of the hardenability model established by each algorithm. The results show that the XGBoost model performed best on the original feature set (R²=0.894, RMSE=0.820 HRC, 94.19% hit rate within ±2 HRC). The RF model after feature screening maintained high accuracy (R²=0.866, RMSE=0.928 HRC, 93.66% hit rate within ±2 HRC) while improving computational efficiency by 33%. The prediction effect is also significantly better than the traditional empirical formula (R²=0.398, RMSE=1.934 HRC, 73.58% hit rate within ±2 HRC). Compared with traditional methods, the machine learning model not only revealed the complex nonlinear relationship between chemical composition and hardenability but also achieved a balance between model interpretability and computational efficiency through feature dimensionality reduction. The research results confirm that the data-driven method can break through the limitations of traditional empirical formulas in multivariate modeling, provide a scientific basis for the control of the narrow hardenability band of 8620 steel, have significant engineering application value, and promote the development of steel manufacturing towards intelligence and precision.

基于特征工程与机器学习的8620钢淬透性高效预测

Hardenability prediction model of 8620 steel based on machine learning