Abstract:
8620 steel is a critical alloy widely used in the manufacturing of gears designed to transmit substantial power and endure heavy loads. Its applications span vital sectors, including transportation, lifting equipment, locomotive traction, and wind power generation. Hardenability, a key measure of the uniformity hardness distribution during heat treatment, significantly influences the mechanical properties and service life of gear steel components. Traditional hardenability assessment relies on the Jominy end-quench test, which is labor-intensive, time-consuming, and costly. Although conventional empirical models provide a practical approach to hardenability prediction, they struggle to capture complex multivariate relationships. This study used 834 production line data, covering 19 chemical compositions and J7.9 values, to obtain 772 valid samples through systematic data preprocessing (including missing value processing and outlier removal based on two times the interquartile range). In the feature engineering stage, Pearson correlation analysis was used to reveal the correlation between the chemical components of
8620 steel. The average linkage hierarchical clustering method was further used to divide the 19 elements into six feature clusters. The SHAP method was used to reveal the key influence of elements such as chromium (Cr), carbon (C), molybdenum (Mo), manganese (Mn), cerium (Ce), tungsten (W), aluminum, and silicon on hardenability. Among them, Cr, C, and Mo showed significant positive contributions, while nitrogen showed a negative effect. Combined with knowledge of materials science, six important features of Cr, C, Mo, Mn, Ce, and W were preliminarily screened out. The four features of C, Cr, Mo, and Ce were further screened out by the optimal subset method to construct a low-dimensional and high-precision model. Seven machine learning algorithms: linear regression (LR), K-nearest neighbor (KNN), decision tree (DT), random forest (RF), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost) and lightweight gradient boosting machine (LightGBM), were used to construct hardenability machine learning models. Ten-fold cross validation was used to ensure the generalization ability of the model. The determination coefficient (
R2), root mean square error (RMSE), and deviation ratio between the model prediction value and experimental value within ±2 HRC of the hardenability bandwidth (hit rate within ±2 HRC) were introduced to measure the performance of the hardenability model established by each algorithm. The results show that the XGBoost model performed best on the original feature set (
R2=0.894, RMSE=0.820 HRC, 94.19% hit rate within ±2 HRC). The RF model after feature screening maintained high accuracy (
R2=0.866, RMSE=0.928 HRC, 93.66% hit rate within ±2 HRC) while improving computational efficiency by 33%. The prediction effect is also significantly better than the traditional empirical formula (
R2=0.398, RMSE=1.934 HRC, 73.58% hit rate within ±2 HRC). Compared with traditional methods, the machine learning model not only revealed the complex nonlinear relationship between chemical composition and hardenability but also achieved a balance between model interpretability and computational efficiency through feature dimensionality reduction. The research results confirm that the data-driven method can break through the limitations of traditional empirical formulas in multivariate modeling, provide a scientific basis for the control of the narrow hardenability band of
8620 steel, have significant engineering application value, and promote the development of steel manufacturing towards intelligence and precision.