Introduction
Diabetes has grow to be a significant public well being burden in China in the twenty first century. The prevalence of diabetes in China had elevated to 12.8% in 2017.1 Reportedly, China had the very best variety of adults with diabetes (140.9 million) in 2021; this quantity has been projected to extend to 174 million by 2045.2 Since most sufferers have kind 2 diabetes, which is preventable by early interventions, environment friendly identification of controllable threat components is essential to implement prevention and intervention methods.
Metabolic syndrome (MetS) is outlined as a cluster of threat components for kind 2 diabetes and atherosclerotic heart problems. MetS has grow to be more and more prevalent worldwide.3,4 Asians are typically thought-about to have a decrease prevalence of MetS as reported to be 24% in China versus 33% in the USA.5,6 However, the MetS prevalence in China has doubled from 2002 to 2012,7 as financial improvement has modified the life-style each in city and rural areas and resulted in extra individuals being obese.8 The quickly growing prevalence of MetS is resulting in extra circumstances of diabetes and medical prices. Lifestyle intervention was confirmed to be environment friendly for people with MetS to forestall the onset of diabetes,9,10 whereas unregulated MetS was the strongest threat for new-onset diabetes.11 More aggressive intervention ought to be carried out in the MetS inhabitants.
Traditional threat fashions have been developed to determine individuals at excessive threat and have proven a possible for detecting the onset of diabetes.12 Recently, the profitable implementation of data applied sciences has enhanced the effectivity of the healthcare system. Machine-learning fashions have been used in the prediction of many frequent illnesses.13 Numerous research have utilized machine-learning methods to foretell the onset of diabetes and enhance diagnostic accuracy.14–18 Machine-learning methods have grow to be a significant instrument in diabetes administration for healthcare suppliers.
In earlier research that used the above-mentioned machine-learning strategies, solely “single time information” was used for the fashions, both for simultaneous prognosis or for prediction of incident diabetes throughout follow-up. Only a couple of research have used a number of years’ information or developments of variables to foretell diabetes.19,20 To our information, the historical past of way of life adjustments or completely different well being trajectories could contribute to the chance of future diabetes. By utilizing machine-learning strategies with a number of years information, we might assemble a extra correct mannequin by taking trajectories under consideration for a extra personalised evaluation.
This research centered on people with MetS who have been at comparatively excessive threat of growing diabetes. By utilizing a number of years’ information from the annual well being examination database, machine-learning fashions for diabetes prediction have been constructed and the prediction efficiency was in contrast between multiple-year and single-year fashions.
Materials and Methods
Data Summary
This research was performed in the Health Management Center of Peking Union Medical College Hospital. All bodily examination information from topics have been retrospectively gathered from 2008 to 2020 and securely saved in the Peking Union Medical College Hospital Health Management database (PUMCH-HM). The database comprised all individuals’ annual examination information together with demographic data, important indicators, laboratory assessments, and medical historical past. The goal inhabitants in this research have been sufferers with MetS that was outlined primarily based on the International Diabetes Federation (IDF) standards (Table 1).21 Diabetes was identified primarily based on a number of of the next standards from the American Diabetes Association (ADA):22 fasting plasma glucose (FPG)≥7.0 mmol/L or glycated hemoglobin (HbA1c) ≥6.5% or self-reported diabetes prognosis per healthcare professionals’ prognosis. The inclusion standards have been: (1) no diabetes was detected when topics have been identified with MetS in the primary yr, and (2) the participant had no less than 6 years’ information in the dataset because the first yr of MetS prognosis. A complete of 4510 individuals (follow-up years: 7±1.4 years) have been extracted from the database and 332 sufferers developed incident diabetes throughout the follow-up interval. The dataset was comprised of 15 variables from three periods: demographic data together with age, intercourse, peak, weight, physique mass index (BMI), waist circumference (WC); important indicators together with systolic blood stress (SBP) and diastolic blood stress (DBP); and laboratory assessments together with FPG, HbA1c, high-density lipoprotein ldl cholesterol (HDL-C), low-density lipoprotein ldl cholesterol (LDL-C), triglyceride (TG), thyroid-stimulating hormone (TSH), and uric acid (UA). The research was performed in accordance with the Declaration of Helsinki and was accepted by the Peking Union Medical College Hospital Ethics Committee. Informed consent was obtained from all sufferers included in the research.
Table 1 The Criteria of the International Diabetes Federation (IDF) for the Definition of Metabolic Syndrome (MetS)
The lacking percentages of every variable have been introduced in Table 2. Three variables together with WC, HbA1c, and TSH misplaced information above 30% as a result of they weren’t collected throughout the annual well being examination till 2014. HDL-C, LDL-C, and UA misplaced information from 1% to 10% primarily as a result of some individuals refused to check them. The different variables have been lacking at random as a consequence of human error and their lacking percentages have been beneath 1%.
Table 2 The Missing Percentages of Each Variable
Data Processing
To start with, it’s essential to impute the lacking information, which is commonly current in medical information. Here, a random forest-based iterative imputation methodology was utilized to the dataset.23 It begins with imputing lacking values of the focused column with the smallest variety of lacking values. The different non-targeted columns with lacking values have been initially imputed by the column imply for columns representing numerical variables and the column mode for columns representing categorical variables. Then, a random forest mannequin was fitted in the imputer with the focused column set as the result variable and the remaining columns set as predictors over the entire rows in the focused column. Subsequently, the lacking rows of the focused columns have been predicted utilizing the rows of non-targeted columns as enter information in the fitted random forest mannequin. After that, the imputer proceeded to the subsequent focused column with the second smallest variety of lacking values in the dataset. The course of repeated itself for every column with lacking values over a number of iterations till it met the stopping criterion. This stopping criterion was ruled by the distinction between the imputed arrays over consecutive iterations.
After the imputation of lacking information, the outliers have been decided by the interquartile vary (IQR) methodology. Q1 represents the twenty fifth percentile and Q3 represents the seventy fifth percentile. IQR is the distinction between Q1 and Q3. For the outliers, they have been positioned exterior the vary between (Q1-1.5*IQR) and (Q3+1.5*IQR). Then the information have been additionally manually examined in response to the benchmarks specified by healthcare professionals. This would produce a big bias with out eradicating outliers earlier than the subsequent step, ie, normalization. As every variable has fully completely different items and scales, the direct enter of those variables into the mannequin will result in biased prediction outcomes dominated by the variable with the biggest variance. Therefore, a easy methodology of z-score normalization commonplace scaling was utilized for all of the options, which basically removes the imply and scales to the unit variance. To replicate the yearly fluctuation of all of the variables throughout the follow-up interval, a number of further options named “delta_xx” for every variable have been computed by making use of the first-order differential equation over the longitudinal information. Moreover, categorical variables like intercourse have been encoded as 0 for feminine and 1 for male.
Model Development
The affected person was labeled as 1 (optimistic) in the event that they have been identified with diabetes in the final file; in any other case, the affected person was labeled as 0 (adverse). Except for the specific options of “intercourse” and “peak” that may stay fixed for every participant, the opposite predictor variables have been derived from the statistical values of the opposite 13 numerical variables. The computation of statistics right here incorporates the common, sum, variance, minimal, and most worth for every year “1~n” information, the place n was outlined because the variety of occasions of well being information ranging from the yr with the prognosis of MetS.
The dataset evaluated three well-liked classification algorithms: logistic regression, random forest, and Xgboost. With Python 3.8, all of the classifiers have been computed utilizing mounted random state worth to make sure constant outcomes. For logistic regression, the parameter “c” defining the relative power of regularization was set as 1 and the regularization method is “L2”. For random forest and Xgboost algorithm, the max depth for all bushes was set as 6 in the forest and the variety of bushes was set as 50. As the dataset is considerably biased in the direction of the adverse topics, random down-sampling was utilized to the bulk class to make sure the steadiness of the entire dataset. Then, the brand new dataset was randomly divided into the coaching (80%) and testing (20%) information. Then least absolute shrinkage and choice operator (LASSO) methodology was utilized to rank the characteristic significance. The fixed “alpha” that multiplies the L1 time period was set as 1 in the LASSO mannequin.
The mannequin was developed to foretell the chance of diabetes onset utilizing the well being information of the primary 3 years. As proven in Figure 1, by utilizing completely different units of well being information, we developed 5 fashions together with the single-year fashions (yr 1, yr 2, and yr 3) and multiple-year fashions (yr 1–2 and yr 1–3). All the classification fashions have been individually assessed by utilizing the realm underneath the receiver working attribute curve (AUROC), recall (also referred to as sensitivity), and precision. These evaluation variables have been computed from the confusion matrix—a generally used measure when fixing classification issues. Four primary ideas that originated from the confusion matrix are true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Precision is outlined by the ratio TP/(TP+FP) measuring the mannequin’s skill to precisely predict sufferers growing diabetes, whereas recall is outlined because the ratio TP/(TP+FN) evaluating the power of the mannequin to label diabetes onset accurately amongst sufferers who certainly develop diabetes. F1 rating is the harmonic imply of precision and recall, which supplies a greater measure of the incorrectly categorized circumstances than the “accuracy” metric. A five-fold stratified cross-validation methodology was utilized to all of the classifiers for inner validation, which may keep away from overfitting throughout the coaching course of.
Figure 1 The definition of every longitudinal dataset in the timeline.
Statistical Analysis
The numerical variables at baseline have been introduced as imply ± commonplace deviation (SD) in the abstract in Table 2. t-tests have been carried out for every variable between sub-groups the place p<0.05 was counted as a statistically significant difference. All statistical analyses were achieved using Python 3.8. The classification models come from two well-built python packages: scikit-learn and Xgboost.
Results
Baseline Characteristics of the Patient Cohort
A total of 4510 patients with MetS were included in the analysis. According to the IDF criteria, the abnormal rate at baseline were WC=48.8%, TG=43.5%, HDL-C=39.4%, SBP/DBP=40.7%, and FPG=28%. In all, 332 patients developed diabetes at the end of the follow-up. All the variables between the two sub-groups exhibited significant differences (Table 3). It is evident that patients with diabetes presented higher FPG (6.43±1.11 mmol/L) and HbA1c (6.00±0.65%) than those without diabetes (FPG: 5.37±0.45 mmol/L; HbA1c: 5.47±0.30%).
Table 3 Baseline Characteristics of Sub-Groups from Patient Cohorts
Model Performance
The performance results for three single-year models and two multiple-year models are presented in Table 4 and Figure 2. Both random forest and Xgboost models over multiple-year data could achieve relatively high-performance results (mean AUROC>0.85 for 2 fashions), whereas the outcomes from single-year information have been barely worse. Among the fashions utilized to the multiple-year information, the best-performing mannequin was Xgboost. The classification outcomes for single-year information confirmed a distinct conclusion whereby the random forest mannequin achieved the perfect efficiency (imply AUROC: 0.835±0.029, imply recall: 0.753±0.001, imply precision: 0.756±0.020, imply F1-score: 0.751±0.014). The mixture of Xgboost mannequin and yr 1–3 dataset confirmed the perfect efficiency outcomes (AUROC: 0.897, recall: 0.831, precision: 0.837, F1-score: 0.834). For each random forest and Xgboost single-year fashions, AUROC elevated from yr 1, to yr 2, and to yr 3 indicating that the most recent information supplied the perfect prediction energy.
Table 4 Performance Metrics of Machine-Learning Models Using Longitudinal Data
Figure 2 ROC curves of the three fashions for all of the datasets.Abbreviation: ROC, receiver working attribute.Notes: (A) the ROC curve of logistic regression for single-year fashions and multiple-year fashions; (B) the ROC curve of random forest for single-year fashions and multiple-year fashions; (C) the ROC curve of Xgboost for single-year fashions and multiple-year fashions.
Longitudinal Data Comparison
Among all of the datasets, the yr 3 dataset introduced the perfect common prediction outcomes for all three fashions (imply AUROC: 0.841±0.018, imply recall: 0.760±0.031, imply precision: 0.784±0.013, imply F1-score: 0.768±0.021). The lowest recall (0.608) and precision (0.549) charges have been each from the logistic regression mannequin for the yr 1–2 dataset. The mannequin performances exhibited evident enchancment with the growing longitudinal dataset in normal, because the AUROC improved with the addition of extra information for each random forest (yr 1–3: AUROC=0.893; yr 3: AUROC=0.862; yr 1–2: AUROC=0.847; yr 2: AUROC=0.838) and Xgboost (yr 1–3: AUROC=0.897; yr 3: AUROC=0.833; yr 1–2: AUROC=0.856; yr 2: AUROC=0.823) fashions. The different analysis parameters together with recall price, precision price, and F1-score additionally demonstrated an apparent enhancement following the buildup of longitudinal information.
Feature Importance for Risk Prediction
The characteristic significance of every dataset utilizing LASSO was proven in Figure 3. Regardless of the dataset used, the highest two options that the majority influenced the prediction outcomes have been FPG and HbA1c or associated statistical options that make sense as they have been used to outline diabetes. In the multiple-year fashions, the very best FPG had crucial predictive worth for the onset of diabetes, adopted by HbA1c and BMI. For each multi-year datasets, some options reflecting the fluctuations of yearly change exist among the many prime 15 options. For the yr 1–2 dataset, the delta of UA ranked sixth, which gives one other helpful characteristic for diabetes prediction. The delta of weight ranked fourth in the yr 1–3 dataset.
Figure 3 Feature significance of every dataset utilizing LASSO.Abbreviations: BMI, physique mass index; WC, waist circumference; SBP, systolic blood stress. DBP, diastolic blood stress; FPG, fasting plasma glucose; HDLC, high-density lipoprotein ldl cholesterol; LDLC, low-density lipoprotein ldl cholesterol; TG, triglyceride; FPG, fasting plasma glucose; HbA1c, glycated hemoglobin; TSH, thyroid-stimulating hormone; UA, uric acid.Notes: Parameter “Del_xx” was abbreviated from “delta_xx”. “Var_xx” was abbreviated from “Variance_xx”.
Discussion
To our information, that is the primary research to make use of a number of years’ information to foretell the chance of diabetes for sufferers with MetS. The common AUROC for each random forest and Xgboost fashions might attain >0.80, indicating the enough efficiency of each classifiers. This research demonstrated total improved efficiency metrics with the buildup of longitudinal information.
Of all of the longitudinal datasets used, Xgboost mannequin carried out finest with the very best AUROC. It additionally introduced vastly related outcomes in recall and precision price, which may be thought-about a well-balanced mannequin. Among the three fashions, Xgboost was probably the most delicate classifier to the longitudinal dataset as its AUROC elevated with the addition of extra years’ information and introduced the biggest variances of AUROC derived from the three years. Random forest mannequin, which introduced the least variance with completely different longitudinal datasets, achieved probably the most steady classification outcomes (AUROC=0.850±0.033) by completely different teams of datasets. Unlike the tree-based fashions, the logistic regression mannequin worsened when including extra longitudinal information as multi-year dataset inputs extra options into the mannequin which will nonetheless trigger overfit. The logistic regression mannequin might not be a great classifier for the longitudinal dataset prediction. The proof that gradual increment of efficiency variables from every single yr could recommend that the nearer to the result yr, the extra correct the mannequin may be.
The common efficiency metrics from a number of years’ information utilizing random forest and Xgboost have been higher than these of every single yr; this outcome has clearly proven the appreciable advantage of utilizing longitudinal information when predicting the onset of diabetes. Moreover, our outcomes indicated that for people with related medical parameters, the variation developments of those parameters might change the chance of future diabetes. Models primarily based on longitudinal a number of years’ information could present extra personalised evaluation instruments for threat analysis. Our prediction fashions exhibited higher outcomes than another longitudinal research. For occasion, Lai et al demonstrated that the Gradient Boost Model (GBM) was finest with an AUROC of 0.847 for diabetes prediction.24 In a not too long ago printed 13 years’ longitudinal research, the cumulative publicity of three years earlier than baseline was used to foretell diabetes by COX regression and the AUC was 0.802.19
In each multiple-year fashions, we discovered that the very best FPG was the strongest predictor of diabetes, adopted by the imply or lowest stage of HbA1c after which, BMI. Decreased thyroid operate (by TSH) was additionally a threat issue in every single-year or multiple-year mannequin apart from the yr 2 mannequin. This result’s constant with present proof that means an elevated kind 2 diabetes threat in individuals with hypothyroidism.25,26 When specializing in deltas that signify the developments of variables, we discovered that delta weight, delta TSH, delta UA, and delta TG have been stronger predictors than delta FPG or HbA1c. Especially in the yr 1–3 mannequin, delta weight was the fourth-most necessary characteristic, suggesting {that a} historical past of gaining weight is the primary threat issue for MetS sufferers to develop diabetes. The significance of weight reduction for diabetes prevention has been confirmed in a number of potential large-scale medical trials such because the Diabetes Prevention Program (DPP), Finnish Diabetes Study, and the Da Qing Study.27–29 Our research supplied a brand new perspective to incorporate the historical past of weight reduction or weight acquire into the individualized threat of diabetes.
Our research is restricted by its retrospective design and the pattern measurement, as we centered on MetS sufferers having a number of years’ well being information. Furthermore, our outcomes must be cautiously extrapolated to the final Chinese inhabitants, on condition that it’s a single-center research, and the individuals have been largely firm staff with a comparatively excessive socioeconomic standing from North China. It is important to validate our proposed mannequin utilizing an exterior dataset in the longer term. A superb mannequin with enough robustness can obtain related outcomes with varied datasets.
Conclusion
To our information, that is the primary research to make use of machine-learning strategies primarily based on a number of years’ information to foretell diabetes in MetS sufferers. This research demonstrated improved efficiency with the buildup of longitudinal information. In the multiple-year fashions, fluctuation of weight and a few biomarkers performed sure roles. This confirmed that fashions primarily based on longitudinal a number of years’ information could present extra personalised evaluation instruments for threat analysis in MetS sufferers.
Acknowledgments
We acknowledge all of the healthcare employees concerned in the institution of the PUMCH-HM database in Peking Union Medical College Hospital.
Disclosure
The authors report no conflicts of curiosity in this work.
References
1. Li Y, Teng D, Shi X, et al. Prevalence of diabetes recorded in mainland China utilizing 2018 diagnostic standards from the American Diabetes Association: nationwide cross-sectional research. BMJ. 2020;369. doi:10.1136/BMJ.M997
2. International Diabetes Federation. IDF Diabetes Atlas [Internet]. tenth ed. International Diabetes Federation; 2021. Available from: http://www.diabetesatlas.org. Accessed April 18, 2022.
3. Aguilar M, Bhuket T, Torres S, Liu B, Wong RJ. Prevalence of the metabolic syndrome in the United States, 2003–2012. JAMA. 2015;313(19):1973. doi:10.1001/jama.2015.4260
4. Ford ES, Giles WH, Dietz WH. Prevalence of the metabolic syndrome amongst US adults. JAMA. 2002;287(3):356. doi:10.1001/jama.287.3.356
5. Hirode G, Wong RJ. Trends in the prevalence of metabolic syndrome in the United States, 2011–2016. JAMA. 2020;323(24):2526. doi:10.1001/jama.2020.4501
6. Li R, Li W, Lun Z, et al. Prevalence of metabolic syndrome in mainland China: a meta-analysis of printed research. BMC Public Health. 2016;16(1):296. doi:10.1186/s12889-016-2870-y
7. He Y, Li Y, Bai G, et al. Prevalence of metabolic syndrome and particular person metabolic abnormalities in China, 2002–2012. Asia Pac J Clin Nutr. 2019;28(3):621–633. doi:10.6133/apjcn.201909_28(3).0023
8. Wu Y. Overweight and weight problems in China. BMJ. 2006;333(7564):362–363. doi:10.1136/bmj.333.7564.362
9. Lee MK, Han Okay, Kim MK, et al. Changes in metabolic syndrome and its parts and the chance of kind 2 diabetes: a nationwide cohort research. Sci Rep. 2020;10(1):2313. doi:10.1038/s41598-020-59203-z
10. Kim D, Yoon SJ, Lim DS, et al. The preventive results of way of life intervention on the incidence of diabetes mellitus and acute myocardial infarction in metabolic syndrome. Public Health. 2016;139:178–182. doi:10.1016/J.PUHE.2016.06.012
11. Ohnishi H, Saitoh S, Akasaka H, Furukawa T, Mori M, Miura T. Impact of longitudinal standing change in metabolic syndrome outlined by two completely different standards on new onset of kind 2 diabetes in a normal Japanese inhabitants: the Tanno-Sobetsu Study. Diabetol Metab Syndr. 2016;8(1). doi:10.1186/S13098-016-0182-0
12. Abbasi A, Peelen LM, Corpeleijn E, et al. Prediction fashions for threat of growing kind 2 diabetes: systematic literature search and impartial exterior validation research. BMJ. 2012;345(sep182):e5900. doi:10.1136/bmj.e5900
13. Dinh A, Miertschin S, Young A, Mohanty SD. An information-driven method to predicting diabetes and heart problems with machine studying. BMC Med Inform Decis Mak. 2019;19(1). doi:10.1186/s12911-019-0918-5
14. Pei D, Gong Y, Kang H, Zhang C, Guo Q. Accurate and fast screening mannequin for potential diabetes mellitus. BMC Med Inform Decis Mak. 2019;19(1):41. doi:10.1186/s12911-019-0790-3
15. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine studying and information mining strategies in diabetes analysis. Comput Struct Biotechnol J. 2017;15:104–116. doi:10.1016/j.csbj.2016.12.005
16. Talaei-Khoei A, Wilson JM. Identifying individuals vulnerable to growing kind 2 diabetes: a comparability of predictive analytics methods and predictor variables. Int J Med Inform. 2018;119:22–38. doi:10.1016/j.ijmedinf.2018.08.008
17. Upadhyaya SG, Murphree DH, Ngufor CG, et al. Automated diabetes case identification utilizing digital well being file information at a tertiary care facility. Mayo Clin Proc. 2017;1(1):100–110. doi:10.1016/j.mayocpiqo.2017.04.005
18. Alghamdi M, Al-Mallah M, Keteyian S, Brawner C, Ehrman J, Sakr S. Predicting diabetes mellitus utilizing SMOTE and ensemble machine studying method: the Henry Ford ExercIse Testing (FIT) undertaking. PLoS One. 2017;12(7):e0179805. doi:10.1371/journal.pone.0179805
19. Simon GJ, Peterson KA, Castro MR, Steinbach MS, Kumar V, Caraballo PJ. Predicting diabetes medical outcomes utilizing longitudinal threat issue trajectories. BMC Med Inform Decis Mak. 2020;20(1):6. doi:10.1186/s12911-019-1009-3
20. Oh W, Kim E, Castro MR, et al. Type 2 diabetes mellitus trajectories and related dangers. Big Data. 2016;4(1):25–30. doi:10.1089/huge.2015.0029
21. Alberti KGMM, Eckel RH, Grundy SM, et al. Harmonizing the metabolic syndrome: a joint interim assertion of the International Diabetes Federation Task Force on Epidemiology and Prevention; National Heart, Lung, and Blood Institute; American Heart Association; World Heart Federation; International Atherosclerosis Society; and International Association for the Study of Obesity. Circulation. 2009;120(16):1640–1645. doi:10.1161/CIRCULATIONAHA.109.192644
22. American Diabetes Association. 2. Classification and prognosis of diabetes: requirements of medical care in diabetes—2022. Diabetes Care. 2022;45(Supplement_1):S17–S38. doi:10.2337/dc22-S002
23. Stekhoven DJ, Bühlmann P. MissForest–non-parametric lacking worth imputation for mixed-type information. Bioinformatics. 2012;28(1):112–118. doi:10.1093/BIOINFORMATICS/BTR597
24. Lai H, Huang H, Keshavjee Okay, Guergachi A, Gao X. Predictive fashions for diabetes mellitus utilizing machine studying methods. BMC Endocr Disord. 2019;19(1). doi:10.1186/s12902-019-0436-6
25. Roa Dueñas OH, van der Burgh AC, Ittermann T, et al. Thyroid operate and the chance of prediabetes and sort 2 diabetes. J Clin Endocrinol Metab. 2022;107(6). doi:10.1210/CLINEM/DGAC006
26. Rong F, Dai H, Wu Y, et al. Association between thyroid dysfunction and sort 2 diabetes: a meta-analysis of potential observational research. BMC Med. 2021;19(1). doi:10.1186/S12916-021-02121-2
27. Diabetes Prevention Program Research Group. 10-year follow-up of diabetes incidence and weight reduction in the Diabetes Prevention Program Outcomes Study. Lancet. 2009;374(9702):1677–1686. doi:10.1016/S0140-6736(09)61457-4
28. Lindström J, Ilanne-Parikka P, Peltonen M, et al. Sustained discount in the incidence of kind 2 diabetes by way of life intervention: follow-up of the Finnish Diabetes Prevention Study. Lancet. 2006;368(9548):1673–1679. doi:10.1016/S0140-6736(06)69701-8
29. Li G, Zhang P, Wang J, et al. Cardiovascular mortality, all-cause mortality, and diabetes incidence after way of life intervention for individuals with impaired glucose tolerance in the Da Qing Diabetes Prevention Study: a 23-year follow-up research. Lancet Diabetes Endocrinol. 2014;2(6):474–480. doi:10.1016/S2213-8587(14)70057-9
https://www.dovepress.com/predicting-diabetes-in-patients-with-metabolic-syndrome-using-machine–peer-reviewed-fulltext-article-DMSO