ObjectiveTo predict the risk factors affecting postoperative recurrence of granulomatous lobular mastitis (GLM) in the mass stage by machine learning algorithm, and to provide a reference for the early identification and prevention of postoperative recurrence of GLM in the mass stage. MethodsThe electronic medical records and follow-up data of patients with GLM in the Department of Breast Disease Unit, the First Affiliated Hospital of Henan University of Traditional Chinese Medicine from October 2020 to January 2023 were selected. A total of 340 patients with GLM in the mass stage who met the inclusion and exclusion criteria were selected as the research subjects. According to whether the patients relapsed after surgery, they were divided into recurrence group and non-recurrence group. The collected cases were randomly divided into training set and test set according to the ratio of 7:3. In the training set, the recurrence prediction model was constructed by using traditional logistic regression and three machine learning algorithms: artificial neural network, random forest and XGBoost (extrem gradient boosting). In the test set, the performance of the model was evaluated by sensitivity, specificity, accuracy,positive predictive value, negative predictive value, F1 value and area under the curve (AUC) value. The Shapley Additive exPlanation (SHAP) method was used to explore the important variables that affect the optimal model in identifying postoperative recurrence in the GLM mass phase. The optimal risk cutoff value of the prediction model was determined by the Youden index. Based on this, the postoperative patients in the GLM mass phase of the external test set were divided into high-risk and low-risk groups. ResultsA total of 392 patients who met the GLM mass stage were included, and 52 cases were excluded according to the exclusion criteria, and 340 cases were finally included, including 60 cases in the recurrence group and 280 cases in the non-recurrence group. Based on the results of univariate analysis, correlation analysis and clinically meaningful influencing factors, 12 non-zero coefficient characteristic variables were screened for the construction of the prediction model, and these 12 characteristic variables included other disease history, number of miscarriages, breastfeeding duration of the affected breast, history of milk stasis, lesion location, nipple indentation, fluctuation sensation, low-density lipoprotein, testosterone, previous antibiotic therapy, previous oral hormone medication, and perioperative traditional Chinese medicine treatment duration. The logistic regression prediction model, artificial neural network, random forest and XGBoost prediction models were constructed, and the results showed that the accuracy, positive predictive value and negative predictive value of the four prediction models were all >75%, among which the XGBoost model had the best performance, with accuracy, specificity, sensitivity, AUC, positive predictive value, negative predictive value and F1 values of 0.93, 0.99, 0.65, 0.87, 0.92, 0.93 and 0.76, respectively. SHAP method found that the duration of traditional Chinese medicine treatment during perioperative period, the duration of breast-feeding on the affected side, low density lipoprotein, testosterone and previous hormone drugs were the top five factors affecting XGBoost model to identify postoperative recurrence of GLM in mass stage. ConclusionsCompared with the traditional Logistic regression prediction model, the models based on machine learning for identifying postoperative recurrence in the GLM mass phase showed better performance, among which the XGBoost model performed best. Targeted preventive measures can be given based on the above risk factors to improve the postoperative prognosis of the GLM mass phase.
Objective The management of pulmonary nodules is a common clinical problem, and this study constructed a nomogram model based on FUT7 methylation combined with CT imaging features to predict the risk of adenocarcinoma in patients with pulmonary nodules. Methods The clinical data of 219 patients with pulmonary nodules diagnosed by histopathology at the First Affiliated Hospital of Zhengzhou University from 2021 to 2022 were retrospectively analyzed. The FUT7 methylation level in peripheral blood were detected, and the patients were randomly divided into training set (n=154) and validation set (n=65) according to proportion of 7:3. They were divided into a lung adenocarcinoma group and a benign nodule group according to pathological results. Single-factor analysis and multi-factor logistic regression analysis were used to construct a prediction model in the training set and verified in the validation set. The receiver operating characteristic (ROC) curve was used to evaluate the discrimination of the model, the calibration curve was used to evaluate the consistency of the model, and the clinical decision curve analysis (DCA) was used to evaluate the clinical application value of the model. The applicability of the model was further evaluated in the subgroup of high-risk CT signs (located in the upper lobe, vascular sign, and pleural sign). Results Multivariate logistic regression analysis showed that female, age, FUT7_CpG_4, FUT7_CpG_6, sub-solid nodules, lobular sign and burr sign were independent risk factors for lung adenocarcinoma (P<0.05). A column-line graph prediction model was constructed based on the results of the multifactorial analysis, and the area under the ROC curve was 0.925 (95%CI 0.877 - 0.972 ), and the maximum approximate entry index corresponded to a critical value of 0.562, at which time the sensitivity was 89.25%, the specificity was 86.89%, the positive predictive value was 91.21%, and the negative predictive value was 84.13%. The calibration plot predicted the risk of adenocarcinoma of pulmonary nodules was highly consistent with the risk of actual occurrence. The DCA curve showed a good clinical net benefit value when the threshold probability of the model was 0.02 - 0.80, which showed a good clinical net benefit value. In the upper lobe, vascular sign and pleural sign groups, the area under the ROC curve was 0.903 (95%CI 0.847 - 0.959), 0.897 (95%CI 0.848 - 0.945), and 0.894 (95%CI 0.831 - 0.956). Conclusions This study developed a nomogram model to predict the risk of lung adenocarcinoma in patients with pulmonary nodules. The nomogram has high predictive performance and clinical application value, and can provide a theoretical basis for the diagnosis and subsequent clinical management of pulmonary nodules.
ObjectiveTo analyze the prognostic factors of patients with bacterial bloodstream infection sepsis and to identify independent risk factors related to death, so as to potentially develop one predictive model for clinical practice. Method A non-intervention retrospective study was carried out. The relative data of adult sepsis patients with positive bacterial blood culture (including central venous catheter tip culture) within 48 hours after admission were collected from the electronic medical database of the First Affiliated Hospital of Dalian Medical University from January 1, 2018 to December 31, 2019, including demographic characters, vital signs, laboratory data, etc. The patients were divided into a survival group and a death group according to in-hospital outcome. The risk factors were analyzed and the prediction model was established by means of multi-factor logistics regression. The discriminatory ability of the model was shown by area under the receiver operating characteristic curve (AUC). The visualization of the predictive model was drawn by nomogram and the model was also verified by internal validation methods with R language. Results A total of 1189 patients were retrieved, and 563 qualified patients were included in the study, including 398 in the survival group and 165 in the death group. Except gender and pathogen type, other indicators yielded statistical differences in single factor comparison between the survival group and the death group. Independent risk factors included in the logistic regression prediction model were: age [P=0.000, 95% confidence interval (CI) 0.949 - 0.982], heart rate (P=0.000, 95%CI 0.966 - 0.987), platelet count (P=0.009, 95%CI 1.001 - 1.006), fibrinogen (P=0.036, 95%CI 1.010 - 1.325), serum potassium ion (P=0.005, 95%CI 0.426 - 0.861), serum chloride ion (P=0.054, 95%CI 0.939 - 1.001), aspartate aminotransferase (P=0.03, 95%CI 0.996 - 1.000), serum globulin (P=0.025, 95%CI 1.006 - 1.086), and mean arterial pressure (P=0.250, 95%CI 0.995 - 1.021). The AUC of the prediction model was 0.779 (95%CI 0.737 - 0.821). The prediction efficiency of the total score of the model's nomogram was good in the 210 - 320 interval, and mean absolute error was 0.011, mean squared error was 0.00018. Conclusions The basic vital signs within 48 h admitting into hospital, as well those homeostasis disordering index indicated by coagulation, liver and renal dysfunction are highly correlated with the prognosis of septic patients with bacterial bloodstream infection. Early warning should be set in order to achieve early detection and rescue patients’ lives.
Objective To analyze the influencing factors for postoperative anastomotic leak (AL) in carcinoma of the esophagus and gastroesophageal junction and construct a nomogram predictive model. Methods The patients who underwent radical esophagectomy at Jinling Hospital Affiliated to Nanjing University School of Medicine from January 2018 to June 2020 were included in this study. Relevant variables were screened using univariate and multivariate logistic regression analyses. A nomogram was then developed to predict the risk factors associated with postoperative AL. The predictive performance of the nomogram was validated using the receiver operating characteristic (ROC) curve. Results A total of 468 patients with carcinoma of the esophagus and gastroesophageal junction were included in the study, comprising 354 males and 114 females, with a mean age of (62.8±7.2) years. The tumors were predominantly located in the middle or lower esophagus, and 51 (10.90%) patients experienced postoperative AL. Univariate logistic regression analysis indicated that age, body mass index (BMI), tumor location, preoperative albumin levels, diabetes mellitus, anastomosis technique, anastomosis site, and C-reactive protein (CRP) levels were potentially associated with AL (P<0.05). Multivariate logistic regression analysis identified age, BMI, tumor location, diabetes mellitus, anastomosis technique, and CRP levels as independent risk factors for AL (P<0.05). A nomogram was developed based on the findings from the multivariate logistic regression analysis. The area under the receiver operating characteristic (ROC) curve was 0.803, indicating a strong concordance between the actual observations and the predicted outcomes. Furthermore, decision curve analysis demonstrated that the newly established nomogram holds significant value for clinical decision-making. Conclusion The predictive model for postoperative AL in patients with carcinoma of the esophagus and gastroesophageal junction demonstrates strong predictive validity and is essential for guiding clinical monitoring, early detection, and preventive strategies.
Objective To explore the independent risk factors for hospital infections in tertiary hospitals in Gansu Province, and establish and validate a prediction model. Methods A total of 690 patients hospitalized with hospital infections in Gansu Provincial Hospital between January and December 2021 were selected as the infection group; matched with admission department and age at a 1∶1 ratio, 690 patients who were hospitalized during the same period without hospital infections were selected as the control group. The information including underlying diseases, endoscopic operations, blood transfusion and immunosuppressant use of the two groups were compared, the factors influencing hospital infections in hospitalized patients were analyzed through multiple logistic regression, and the logistic prediction model was established. Eighty percent of the data from Gansu Provincial Hospital were used as the training set of the model, and the remaining 20% were used as the test set for internal validation. Case data from other three hospitals in Gansu Province were used for external validation. Sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC) were used to evaluate the model effectiveness. Results Multiple logistic regression analysis showed that endoscopic therapeutic manipulation [odds ratio (OR)=3.360, 95% confidence interval (CI) (2.496, 4.523)], indwelling catheter [OR=3.100, 95%CI (2.352, 4.085)], organ transplantation/artifact implantation [OR=3.133, 95%CI (1.780, 5.516)], blood or blood product transfusions [OR=3.412, 95%CI (2.626, 4.434)], glucocorticoids [OR=2.253, 95%CI (1.608, 3.157)], the number of underlying diseases [OR=1.197, 95%CI (1.068, 1.342)], and the number of surgical procedures performed during hospitalization [OR=1.221, 95%CI (1.096, 1.361)] were risk factors for hospital infections. The regression equation of the prediction model was: logit(P)=–2.208+1.212×endoscopic therapeutic operations+1.131×indwelling urinary catheters+1.142×organ transplantation/artifact implantation+1.227×transfusion of blood or blood products+0.812×glucocorticosteroids+0.180×number of underlying diseases+0.200×number of surgical procedures performed during the hospitalization. The internal validation set model had a sensitivity of 72.857%, a specificity of 77.206%, an accuracy of 76.692%, and an AUC value of 0.817. The external validation model had a sensitivity of 63.705%, a specificity of 70.934%, an accuracy of 68.669%, and an AUC value of 0.726. Conclusions Endoscopic treatment operation, indwelling catheter, organ transplantation/artifact implantation, blood or blood product transfusion, glucocorticoid, number of underlying diseases, and number of surgical cases during hospitalization are influencing factors of hospital infections. The model can effectively predict the occurrence of hospital infections and guide the clinic to take preventive measures to reduce the occurrence of hospital infections.
Objective To identify and screen sensitive predictors associated with subscapularis (SSC) tendon tear and develop a web-based dynamic nomogram to assist clinicians in early identification and intervention of SSC tendon tear. Methods Between July 2016 and December 2021, 528 consecutive cases of patients who underwent shoulder arthroscopic surgery with completely MRI and clinical data were retrospectively analyzed. Patients admitted between July 2016 and July 2019 were included in the training cohort, and patients admitted between August 2019 and December 2021 were included in the validation cohort. According to the diagnosis of arthroscopy, the patients were divided into SSC tear group and non-SSC tear group. Univariate analysis, least absolute shrinkage and selection operator (LASSO) method, and 10-fold cross-validation method were used to screen for reliable predictors highly associated with SSC tendon tear in a training set cohort, and R language was used to build a nomogram model for internal and external validation. The prediction performance of the nomogram was evaluated by concordance index (C-index) and calibration curve with 1 000 Bootstrap. Receiver operating curves were drawn to evaluate the diagnostic performance (sensitivity, specificity, predictive value, likelihood ratio) of the predictive model and MRI (based on direct signs), respectively. Decision curve analysis (DCA) was used to evaluate the clinical implications of predictive models and MRI. Results The nomogram model showed good discrimination in predicting the risk of SSC tendon tear in patients [C-index=0.878; 95%CI (0.839, 0.918)], and the calibration curve showed that the predicted results were basically consistent with the actual results. The research identified 6 predictors highly associated with SSC tendon tears, including coracohumeral distance (oblique sagittal) reduction, effusion sign (Y-plane), subcoracoid effusion sign, biceps long head tendon displacement (dislocation/subluxation), multiple posterosuperior rotator cuff tears (≥2, supra/infraspinatus), and MRI suspected SSC tear (based on direct sign). Compared with MRI diagnosis based on direct signs of SSC tendon tear, the predictive model had superior sensitivity (80.2% vs. 57.0%), positive predictive value (53.9% vs. 53.3%), negative predictive value (92.7% vs. 86.3%), positive likelihood ratio (3.75 vs. 3.66), and negative likelihood ratio (0.25 vs. 0.51). DCA suggested that the predictive model could produce higher clinical benefit when the risk threshold probability was between 3% and 93%. ConclusionThe nomogram model can reliably predict the risk of SSC tendon tear and can be used as an important tool for auxiliary diagnosis.
As the volume of medical research using large language models (LLM) surges, the need for standardized and transparent reporting standards becomes increasingly critical. In January 2025, Nature Medicine published statement titled by TRIPOD-LLM reporting guideline for studies using large language models. This represents the first comprehensive reporting framework specifically tailored for studies that develop prediction models based on LLM. It comprises a checklist with 19 main items (encompassing 50 sub-items), a flowchart, and an abstract checklist (containing 12 items). This article provides an interpretation of TRIPOD-LLM’s development methods, primary content, scope, and the specific details of its items. The goal is to help researchers, clinicians, editors, and healthcare decision-makers to deeply understand and correctly apply TRIPOD-LLM, thereby improving the quality and transparency of LLM medical research reporting and promoting the standardized and ethical integration of LLM into healthcare.
ObjectiveTo analyze the relevant risk factors affecting postoperative relapse-free survival (RFS) in the primary gastrointestinal stromal tumors (GIST) and develop a Nomogram predictive model of postoperative RFS for the GIST patients. MethodsThe patients diagnosed with GIST by postoperative pathology from January 2011 to December 2020 at the First Hospital of Lanzhou University and Gansu Provincial People’s Hospital were collected, and then were randomly divided into a training set and a validation set at a ratio of 7∶3 using R software function. The univariate and multivariate Cox regression analysis were used to identify the risk factors affecting the RFS for the GIST patients after surgery, and then based on this, the Nomogram predictive model was constructed to predict the probability of RFS at 3- and 5-year after surgery for the patients with GIST. The effectiveness of the Nomogram was evaluated using the area under the receiver operating characteristic curve (AUC), consistency index (C-index), and calibration curve, and the clinical utility of the Nomogram and the modified National Institutes of Health (M-NIH) classification standard was evaluated using the decision curve analysis (DCA). ResultsA total of 454 patients were included, including 317 in the training set and 137 in the validation set. The results of multivariate Cox regression analysis showed that the tumor location, tumor size, differentiation degree, American Joint Committee onCancer TNM stage, mitotic rate, CD34 expression, treatment method, number of lymph node detection, and targeted drug treatment time were the influencing factors of postoperative RFS for the GIST patients (P<0.05). The Nomogram predictive model was constructed based on the influencing factors. The C-index of the Nomogram in the training set and validation set were 0.731 [95%CI (0.679, 0.783)] and 0.685 [95%CI (0.647, 0.722)], respectively. The AUC (95%CI) of distinguishing the RFS at 3- and 5-year after surgery were 0.764 (0.681, 0.846) and 0.724 (0.661, 0.787) in the training set and 0.749 (0.625, 0.872) and 0.739 (0.647, 0.832) in the validation set, respectively. The calibration curve results showed that a good consistency of the 3-year and 5-year recurrence free survival rates between the predicted results and the actual results in the training set, while which was slightly poor in the validation set. There was a higher net benefit for the 3-year recurrence free survival rate after GIST surgery when the threshold probability range was 0.19 to 0.57. When the threshold probability range was 0.44 to 0.83, there was a higher net benefit for the 5-year recurrence free survival rate after GIST surgery. And within the threshold probability ranges, the net benefit of the Nomogram was better than the M-NIH classification system at the corresponding threshold probability. ConclusionsThe results of this study suggest that the patients with GIST located in the other sites (mainly including the esophagus, duodenum, and retroperitoneum), with tumor size greater than 5 cm, poor or undifferentiated differentiation, mitotic rate lower than 5/50 HPF, negative CD34 expression, ablation treatment, number of lymph nodes detected more than 4, and targeted drug treatment time less than 3 months need to closely pay attentions to the postoperative recurrence. The discrimination and clinical applicability of the Nomogram predictive model are good.
ObjectiveTo establish and internally validate a predictive model for poorly differentiated adenocarcinoma based on CT imaging and tumor marker results. MethodsPatients with solid and partially solid lung nodules who underwent lung nodule surgery at the Department of Thoracic Surgery, the Affiliated Brain Hospital of Nanjing Medical University in 2023 were selected and randomly divided into a training set and a validation set at a ratio of 7:3. Patients' CT features, including average density value, maximum diameter, pleural indentation sign, and bronchial inflation sign, as well as patient tumor marker results, were collected. Based on postoperative pathological results, patients were divided into a poorly differentiated adenocarcinoma group and a non-poorly differentiated adenocarcinoma group. Univariate analysis and logistic regression analysis were performed on the training set to establish the predictive model. The receiver operating characteristic (ROC) curve was used to evaluate the model's discriminability, the calibration curve to assess the model's consistency, and the decision curve to evaluate the clinical value of the model, which was then validated in the validation set. ResultsA total of 299 patients were included, with 103 males and 196 females, with a median age of 57.00 (51.00, 67.25) years. There were 211 patients in the training set and 88 patients in the validation set. Multivariate analysis showed that carcinoembryonic antigen (CEA) value [OR=1.476, 95%CI (1.184, 1.983), P=0.002], cytokeratin 19 fragment antigen (CYFRA21-1) value [OR=1.388, 95%CI (1.084, 1.993), P=0.035], maximum tumor diameter [OR=6.233, 95%CI (1.069, 15.415), P=0.017], and average density [OR=1.083, 95%CI (1.020, 1.194), P=0.040] were independent risk factors for solid and partially solid lung nodules as poorly differentiated adenocarcinoma. Based on this, a predictive model was constructed with an area under the ROC curve of 0.896 [95%CI (0.810, 0.982)], a maximum Youden index corresponding cut-off value of 0.103, sensitivity of 0.750, and specificity of 0.936. Using the Bootstrap method for 1000 samplings, the calibration curve predicted probability was consistent with actual risk. Decision curve analysis indicated positive benefits across all prediction probabilities, demonstrating good clinical value. ConclusionFor patients with solid and partially solid lung nodules, preoperative use of CT to measure tumor average density value and maximum diameter, combined with tumor markers CEA and CYFRA21-1 values, can effectively predict whether it is poorly differentiated adenocarcinoma, allowing for early intervention.
Objective To clarify the specific clinical predictive efficacy of CT and serological indicators for the progression of connective tissue disease-associated interstitial lung disease (CTD-ILD) to progressive pulmonary fibrosis (PPF). Methods Patients who were diagnosed with CTD-ILD in Chest Hospital of Zhengzhou University Between January 2020 and December 2021 were recruited in the study. Clinical data and high-resolution CT results of the patients were collected. The patients were divided into a stable group and a progressive group (PPF group) based on whether PPF occurred during follow-up. COX proportional hazards regression was used to identify risk factors affecting the progression of CTD-ILD to PPF, and a risk prediction model was established based on the results of the COX regression model. The predictive efficacy of the model was evaluated through internal cross-validation. Results A total of 194 patients diagnosed with CTD-ILD were enrolled based on the inclusion and exclusion criteria. Among them, 34 patients progressed to PPF during treatment, and 160 patients did not progress. The variables obtained at lambda$1se in LASSO regression were ANCA associated vasculitis, lymphocytes, albumin, erythrocyte sedimentation rate, and serum ferritin. Multivariate COX regression analysis showed that the extent of fibrosis, serum ferritin, albumin, and age were independent risk factors for the progression of CTD-ILD to PPF (all P<0.05). A prediction model was established based on the results of the multivariate COX regression analysis. The area under the receiver operator characteristic curve at 6 months, 9 months, and 12 months was 0.989, 0.931, and 0.797, respectively, indicating that the model has good discrimination and sensitivity, and good predictive efficacy. The calibration curve showed a good overlap between predicted and actual values. Conclusions The extent of fibrosis, serum ferritin, albumin, and age are independent risk factors for the progression of CTD-ILD to PPF. The model established based on this and externally validated shows good predictive efficacy.