基于可解释机器学习模型的甲状腺乳头状癌诊断预测

Predictive diagnosis of papillary thyroid carcinoma using interpretable machine learning

  • 摘要: 甲状腺乳头状癌(Papillary thyroid carcinoma, PTC)是甲状腺癌中最常见的类型,其早期症状的隐匿性常常导致诊断延迟. 为了改善这一状况,本研究旨在开发和验证一种基于机器学习的预测模型用于PTC的诊断,从而为临床决策提供强有力的支持. 本研究从2907名良性结节和PTC患者队列中收集并整合了人口统计学、超声影像及实验室特征,采用Lasso回归进行特征选择,使用9种机器学习算法构建PTC诊断模型,并使用了SHAP方法对最佳模型进行可解释性分析. 最终,通过Lasso回归确定了11个与PTC相关的生物标志物,XGBoost模型在辅助PTC诊断方面表现最佳,受试者工作曲线下面积为0.9066,准确率为0.8744. 校准曲线和临床决策分析也显示XGBoost具有最佳的模型性能. SHAP结果显示,TI-RADS分类、结节形态和超声回声强度是PTC诊断的3个最重要预测因素,而年龄、尿酸水平、钾浓度、甲状腺球蛋白、抗甲状腺过氧化物酶抗体、高血压状态、钙化情况以及饮酒情况也显示出不同程度的影响. 本研究成功开发并验证了一个综合性的机器学习模型,该模型结合了多种患者因素用于识别PTC,不仅展示了在PTC诊断方面的巨大潜力,而且通过SHAP分析增强了模型的透明度和可信度,有助于临床医生更深入地理解关键预测因素的作用机制.

     

    Abstract: Papillary thyroid carcinoma (PTC) is the most prevalent type of thyroid cancer, and the insidious nature of its early symptoms often leads to delayed diagnosis. To address this challenge, this study aimed to develop and validate a machine learning (ML)-based predictive model for PTC diagnosis to enhance clinical decision-making. We enrolled a retrospective cohort of 2,907 patients, including 1,005 individuals with benign thyroid nodules and 1,902 with histologically confirmed PTC. Comprehensive demographic, ultrasonographic, and laboratory data encompassing 70 initial features were collected. The dataset was partitioned into training and independent test sets in an 8:2 ratio to ensure robust model validation. Feature selection was performed using least absolute shrinkage and selection operator regression, which identified 11 clinically relevant biomarkers strongly associated with PTC: age, TI-RADS classification, ultrasound echogenicity, nodule morphology, calcification status, hypertension, alcohol consumption, uric acid level, potassium concentration, thyroglobulin (Tg), and anti-thyroid peroxidase antibodies (Anti-TPO). These features were used to construct diagnostic models using nine ML algorithms: random forest (RF), adaptive boosting, extreme gradient boosting (XGBoost), classification and regression tree, light gradient boosting machine, gradient boosting decision tree, support vector machine, multilayer perceptron, and logistic regression (LR). To optimize the generalizability and stability of the model, a rigorous training framework was implemented by combining 10-fold cross-validation with grid search hyperparameter tuning, where the parameter configurations were optimized to maximize the area under the receiver operating characteristic curve (AUC). The model performance was comprehensively evaluated on the test set using a suite of metrics derived from confusion matrices, including accuracy, recall, F1 score, positive predictive value, negative predictive value, and AUC. Additionally, calibration and decision curve analysis (DCA) were conducted to assess the reliability and clinical utility of the predicted probabilities across risk thresholds. Among the nine models, XGBoost emerged as the top performer, achieving an exceptional discriminative ability with an accuracy of 0.8744, an F1 score of 0.9098, and an AUC of 0.9066. Calibration analysis revealed that XGBoost and LR exhibited the closest alignments to the ideal diagonal curve, indicating well-calibrated probability estimates. DCA further demonstrated that XGBoost and RF provided significantly higher net clinical benefits than the other models within the threshold probability range of 0.1–0.5, underscoring their practical utility in guiding clinical interventions. To enhance the interpretability and foster clinician trust, Shapley additive explanations (SHAP) were applied to deconstruct the optimal XGBoost model. SHAP analysis identified the TI-RADS classification, nodule morphology, and ultrasound echogenicity as the three most influential predictors of PTC. Secondary contributors included age, uric acid, potassium, Tg, anti-TPO, hypertension, calcification, and alcohol consumption, each with variable impact magnitudes. These findings not only align with established clinical knowledge but also highlight novel interactions between metabolic markers and imaging features in PTC pathogenesis. This study successfully developed a multimodal ML framework that integrated diverse patient characteristics for PTC diagnosis. The XGBoost-based model demonstrated superior diagnostic accuracy and clinical applicability, whereas the SHAP-driven interpretability bridged the gap between algorithmic predictions and actionable clinical insights. By elucidating the mechanistic roles of the key predictors, this study advances personalized risk stratification and empowers clinicians to make informed decisions regarding PTC management. Future studies should focus on external validation across multicenter cohorts and real-world implementations to assess the translational impact.

     

/

返回文章
返回