LOU Pengwei, HUANG Yuting, ZENG Hui, XU Jiabo. Predictive diagnosis of papillary thyroid carcinoma using interpretable machine learning[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2024.12.30.001
Citation: LOU Pengwei, HUANG Yuting, ZENG Hui, XU Jiabo. Predictive diagnosis of papillary thyroid carcinoma using interpretable machine learning[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2024.12.30.001

Predictive diagnosis of papillary thyroid carcinoma using interpretable machine learning

  • Papillary thyroid carcinoma (PTC) is the most prevalent type of thyroid cancer, and the insidious nature of its early symptoms often leads to delayed diagnosis. To address this challenge, this study aimed to develop and validate a machine learning (ML)-based predictive model for PTC diagnosis to enhance clinical decision-making. We enrolled a retrospective cohort of 2,907 patients, including 1,005 individuals with benign thyroid nodules and 1,902 with histologically confirmed PTC. Comprehensive demographic, ultrasonographic, and laboratory data encompassing 70 initial features were collected. The dataset was partitioned into training and independent test sets in an 8:2 ratio to ensure robust model validation. Feature selection was performed using least absolute shrinkage and selection operator regression, which identified 11 clinically relevant biomarkers strongly associated with PTC: age, TI-RADS classification, ultrasound echogenicity, nodule morphology, calcification status, hypertension, alcohol consumption, uric acid level, potassium concentration, thyroglobulin (Tg), and anti-thyroid peroxidase antibodies (Anti-TPO). These features were used to construct diagnostic models using nine ML algorithms: random forest (RF), adaptive boosting, extreme gradient boosting (XGBoost), classification and regression tree, light gradient boosting machine, gradient boosting decision tree, support vector machine, multilayer perceptron, and logistic regression (LR). To optimize the generalizability and stability of the model, a rigorous training framework was implemented by combining 10-fold cross-validation with grid search hyperparameter tuning, where the parameter configurations were optimized to maximize the area under the receiver operating characteristic curve (AUC). The model performance was comprehensively evaluated on the test set using a suite of metrics derived from confusion matrices, including accuracy, recall, F1 score, positive predictive value, negative predictive value, and AUC. Additionally, calibration and decision curve analysis (DCA) were conducted to assess the reliability and clinical utility of the predicted probabilities across risk thresholds. Among the nine models, XGBoost emerged as the top performer, achieving an exceptional discriminative ability with an accuracy of 0.8744, an F1 score of 0.9098, and an AUC of 0.9066. Calibration analysis revealed that XGBoost and LR exhibited the closest alignments to the ideal diagonal curve, indicating well-calibrated probability estimates. DCA further demonstrated that XGBoost and RF provided significantly higher net clinical benefits than the other models within the threshold probability range of 0.1–0.5, underscoring their practical utility in guiding clinical interventions. To enhance the interpretability and foster clinician trust, Shapley additive explanations (SHAP) were applied to deconstruct the optimal XGBoost model. SHAP analysis identified the TI-RADS classification, nodule morphology, and ultrasound echogenicity as the three most influential predictors of PTC. Secondary contributors included age, uric acid, potassium, Tg, anti-TPO, hypertension, calcification, and alcohol consumption, each with variable impact magnitudes. These findings not only align with established clinical knowledge but also highlight novel interactions between metabolic markers and imaging features in PTC pathogenesis. This study successfully developed a multimodal ML framework that integrated diverse patient characteristics for PTC diagnosis. The XGBoost-based model demonstrated superior diagnostic accuracy and clinical applicability, whereas the SHAP-driven interpretability bridged the gap between algorithmic predictions and actionable clinical insights. By elucidating the mechanistic roles of the key predictors, this study advances personalized risk stratification and empowers clinicians to make informed decisions regarding PTC management. Future studies should focus on external validation across multicenter cohorts and real-world implementations to assess the translational impact.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return