Majd Gharamaleki A, Majd Gharamaleki A, Amanollahi A, Tabibzadeh S. Diagnostic Accuracy of Deep Learning for Predicting Lymph Node Metastasis Based on Computed Tomography in Gastric Cancer: A Systematic Review and Meta-analysis. Med J Islam Repub Iran 2025; 39 (1) :979-996
URL:
http://mjiri.iums.ac.ir/article-1-9777-en.html
School of Medicine, Islamic Azad University, Ardabil, Iran , sarvin.tabibzadeh@iau.ir
Abstract: (371 Views)
Background: Early detection of lymphatic metastasis (LNM) in gastric cancer (GC) is essential to determine the treatment strategy. Conventional methods exhibit limited efficacy, highlighting the need for more reliable approaches. Deep learning (DL) models show promise for LNM detection in computed tomography (CT); their performance requires comprehensive evaluation. This systematic review and meta-analysis evaluate the diagnostic performance of CT-based DL models for detecting LNM in GC patients.
Methods: A systematic review and meta-analysis was conducted according to PRISMA-DTA guidelines. PubMed, Embase, and Web of Science were searched up to May 5, 2025. The focus was on studies that used DL models to detect LNM in CT in GC. Using a bivariate random effect model, Pooled estimates were calculated, heterogeneity and publication bias were assessed, and clinical utility was evaluated via Fagan plots and likelihood ratio matrices. Validation type, input data types, CT phases, segmentation techniques, and DL architectures stratified subgroup analyses. The quality was assessed with QUADAS-2.
Results: From the 14 included studies, 11 studies with 5296 patients were analyzed. In internal validation, DL feature-based models achieved a pooled area under the curve (AUC) of 0.91 (95% CI: 0.88-0.93), sensitivity of 0.86 (95% CI: 0.75-0.92), and specificity of 0.83 (95% CI: 0.67-0.92). Performance degraded in external validation, with specificity dropping to 0.59 (95% CI: 0.26-0.85). Models that integrated DL features with radiomics features showed similar overall performance but were noted to have a higher confirmatory power. In terms of clinical utility, although the models could significantly alter post-test probabilities, they ultimately lacked the certainty required to serve as standalone diagnostic tools.
Conclusion: CT-based DL models show high diagnostic accuracy but limited generalizability across external datasets, indicating overfitting. A key finding of this meta-analysis is that pervasive and asymmetric heterogeneity, particularly in specificity, suggests that technical standardization alone is insufficient. Integrating clinical variables reduces heterogeneity; however, prospective, multicenter studies are needed to further enhance reproducibility.