In this part, a complete evaluation of outcomes, together with numerous rating plots, is offered. The datasets are generated using Mean, MICE, and Age Group-based imputation methods to handle lacking values. The evaluation additionally encompasses the rating plots for the unique dataset, by which lacking values are managed via list-wise deletion. Furthermore, the outcomes for the augmented dataset are offered in an analogous method. The outcomes are offered in three subsections: The first and second subsections focuse on the efficiency of numerous classification fashions, with a selected emphasis on evaluating their effectiveness in opposition to a baseline mannequin. The second subsection is in regards to the DSE mannequin outcomes.To gauge the effectiveness of the fashions, a various set of metrics is employed. These metrics embody the confusion matrix, which incorporates true optimistic (TP), false optimistic (FP), true damaging (TN), and false damaging (FN) values for precise and predicted knowledge. The definition of these confusion matrix parameters may be present in Table 3. Additional metrics, comparable to accuracy, precision, recall, F1 rating, and AUC are utilized to supply a complete analysis of classifier performance48,49. Table 4 explains these metrics briefly. The findings spotlight the effectiveness of the totally different imputation methods in dealing with lacking knowledge and showcase the affect on mannequin accuracy. The evaluation of the augmented dataset demonstrates the advance achieved by incorporating important options. Overall, this part contributes to a complete understanding of the assorted mannequin’s efficiency and insights for future analysis and mannequin growth.Table 3 Definition of confusion matrix parameters.Table 4 Definition of efficiency metrics.Results of baseline mannequinThe efficiency of the baseline mannequin diversified notably between the imbalanced and balanced datasets, as proven in Fig. 9. On the imbalanced dataset, the best F1 rating achieved was solely 23.09%, which is significantly decrease than the F1 rating of 87.04% noticed on the balanced datasets. This discrepancy is constant throughout all different metrics, together with accuracy, precision, recall, and AUC. In distinction, the baseline mannequin exhibited notably increased efficiency on the balanced dataset, each in phrases of the unique knowledge and the imputed datasets. For occasion, on the unique dataset, the baseline mannequin attained the best accuracy of 87.12%, precision of 87.65%, recall of 86.44%, F1 rating of 87.04%, and AUC of 87.12%. This degree of efficiency was intently mirrored within the imputed datasets, with the MICE-imputed dataset exhibiting an accuracy of 85.54%, precision of 85.95%, recall of 85.44%, F1 rating of 85.46%, and AUC of 85.54%. These findings counsel that both dropping rows with lacking values or using imputation methods can considerably improve the usability of the dataset, seemingly because of the elevated availability of knowledge for coaching the fashions.Figure 9Baseline mannequin efficiency throughout numerous datasets. The efficiency on imbalanced datasets is considerably decrease than on the balanced datasets.Results of superior classification fashionsIn this part, we current the outcomes of our rigorous analysis of superior classification fashions. These fashions have been extensively assessed to supply insights into their predictive efficiency for stroke occurrences.Imbalanced and balanced ımputed datasets resultsMICE imputation yields barely higher outcomes than Mean and Age Group-based imputation among the many imbalanced imputed datasets. The top-performing mannequin throughout all imputed datasets is LR-AGD, adopted by NGBoost and Balanced Bagging. LR-AGD achieves a k-fold imply accuracy of 94.94%. The precision, recall, and F1 rating are 95.20%, 94.94%, and 92.50%, respectively. Figure 10 reveals the k-fold imply accuracy for all fashions on the imbalanced imputed datasets.Figure 10Models k-fold imply accuracy on imbalanced imputed datasets. The imputed imbalanced datasets are created using three totally different methods particularly as man, MICE, and age group-based.The Age Group-based balanced imputed dataset performs barely higher than the Mean and MICE imputed balanced datasets. XGBoost is the top-performing mannequin, adopted intently by LightGBM and Random Forest. The k-folds imply accuracy for the top-performing mannequin is 93.48%, with precision, recall, and F1 rating of 94.03%, 93.77%, and 93.76%, respectively. Figure 11 presents the k-fold imply accuracy of all fashions on balanced imputed datasets.Figure 11Models k-fold imply accuracy on balanced imputed datasets. The imputed balanced datasets are created using three totally different methods particularly as man, MICE, and age group-based.On the imbalanced MICE imputed testing dataset, LR-AGD achieves an accuracy of 96.28%, precision of 100%, recall of 14.93%, and F1 rating of 25.97%. Moreover, for XGBoost on balanced Age Group-Based imputed testing dataset, the testing accuracy, precision, recall, and F1 rating are 96.37%, 96.62%, 96.09%, and 96.35% respectively. The confusion matrices for the LR-AGD and XGBoost mannequin on the respective testing datasets are displayed in Fig. 12. The matrix gives a visible illustration of the mannequin’s efficiency. It permits for a complete evaluation of the mannequin’s accuracy and the distribution of right and incorrect predictions throughout totally different courses.Figure 12Confusion matrices of LR-AGD and XGBoost on imputed datasets. Confusion matrix illustrating the efficiency of LR-AGD (proper) and XGBoost (left) fashions in stroke case classification on imbalanced and balanced imputed datasets, respectively.The LR-AGD mannequin persistently displayed the best precision for the imbalanced datasets, with a worth of 95.19%. It showcased its skill to foretell optimistic cases precisely. On the opposite hand, when contemplating the balanced datasets, the XGBoost, LightGBM, and Random Forest fashions emerged as the highest performers in phrases of precision. The XGBoost mannequin demonstrated the best precision, starting from 95.56% to 95.68%, adopted intently by the LightGBM and Random Forest fashions with 95.53% and 94.96%, respectively. The k-fold imply precision of fashions on all imputed datasets is displayed in Fig. 13.Figure 13Models k-fold imply precision on all imputed datasets. The imputed datasets are created using three totally different methods particularly as man, MICE, and age group-based and divided into two classes named as imbalanced and balanced.NGBoost, Balanced Bagging, and LR-AGD fashions persistently present excessive recall values, starting from 94.91% to 94.94%, throughout totally different imputation methods for the imbalanced datasets. In the case of balanced datasets, XGBoost, LightGBM, and Random Forest fashions exhibit increased recall values of 95.90%, 95.53%, and 94.84%, respectively. These outcomes point out that the fashions are usually efficient in capturing precise optimistic cases and accurately figuring out them. The k-fold imply recall of fashions on all imputed datasets is displayed in Fig. 14.Figure 14Models k-fold imply recall on all imputed datasets. The imputed datasets are created using three totally different methods particularly as man, MICE, and age group-based and divided into two classes named as imbalanced and balanced.Among the fashions evaluated, surprisingly, the Random Forest mannequin achieved the best F1 rating of 94.72% on the imbalanced MICE imputed dataset and persistently carried out nicely throughout all imputation methods. XGBoost and LightGBM additionally carried out nicely. When contemplating the balanced datasets, XGBoost, LightGBM, and Random Forest fashions exhibited the best F1 scores, with values of 95.90%, 95.53% and 94.82%, respectively. Overall, these outcomes point out that XGBoost, LightGBM, and Random Forest fashions are promising fashions for stroke prediction, showcasing their skill to realize correct classifications throughout totally different dataset traits. The k-fold imply F1 rating of fashions on all imputed datasets is displayed in Fig. 15.Figure 15Models k-fold imply f1 rating on all imputed datasets. The imputed datasets are created using three totally different methods particularly as man, MICE, and age group-based and divided into two classes named as imbalanced and balanced.Imbalanced and balanced authentic datasets outcomesAmong the fashions evaluated on the unique dataset, which is created by eradicating the lacking worth rows, LR-AGD emerges as the highest performer, intently adopted by NGBoost and Balanced Bagging. LR-AGD achieves a k-fold imply accuracy of 95.46%, whereas NGBoost and Balanced Bagging obtain 95.46% and 95.43%, respectively. LR-AGD reveals a precision, recall, and F1 rating of 94.03%, 93.77%, and 93.76%, respectively. These distinctive outcomes underscore the robustness and efficacy of LR-AGD in dealing with advanced knowledge eventualities. The k-fold imply accuracy for all fashions on imbalanced and balanced datasets is displayed in Fig. 16. For a balanced dataset with eliminated lacking worth rows, XGBoost emerges because the top-performing mannequin, reaching a k-fold imply accuracy of 96.14%. It additionally reveals excessive precision, recall, and F1 rating, all at 96.14%. LightGBM and Random Forest observe intently behind in phrases of second and third finest fashions.Figure 16Models k-fold imply accuracy on authentic datasets. The dataset is categorized into two teams: imbalanced, reflecting its preliminary state, and balanced, achieved after using oversampling approach.LR-AGD on imbalanced testing authentic dataset achieves the testing accuracy of 96.81%, precision of 87.50%, recall of 13.21%, and F1 rating of 22.95%, respectively. When examined on the balanced authentic dataset, XGBoost maintains its superior efficiency with a testing accuracy of 98.33% and F1 rating of 98.86%, whereas precision of 98.95%, and recall of 98.77%. The confusion matrices for the LR-AGD and XGBoost fashions on the respective testing datasets are proven in Fig. 17.Figure 17Confusion matrices of LR-AGD and XGBoost on authentic datasets. Confusion matrix illustrating the efficiency of LR-AGD (proper) and XGBoost (left) fashions in stroke case classification on imbalanced and balanced authentic datasets, respectively.On the imbalanced authentic dataset, LR-AGD persistently achieved the best precision and recall scores, with values of 93.42% and 95.46%, respectively. Regarding precision, the second and third finest fashions are Neural Network and Balanced Bagging. NGBoost and Balanced Bagging fashions additionally confirmed spectacular recall, the identical because the LR-AGD. When contemplating the F1 rating, which gives a balanced measure of precision and recall, all fashions achieved comparable scores of round 93.3% besides Neural Network. On the opposite hand, on the balanced datasets, XGBoost, LightGBM, and Random Forest persistently outperformed the opposite fashions concerning precision, recall, and F1 rating, with values of 96.14%, 95.78%, and 95.44%, respectively.Above talked about outcomes spotlight the effectiveness of LR-AGD, XGBoost, LightGBM, and Random Forest in precisely classifying strokes throughout imbalanced and balanced datasets relying on the character of the stability within the dataset, emphasizing their potential for stroke prediction purposes. Furthermore, the k-fold imply precision, recall, and F1 rating of these fashions on each imbalanced and balanced authentic datasets are visually depicted in Fig. 18.Figure 18Models k-fold imply efficiency metrics on authentic datasets. The dataset is categorized into two teams: imbalanced, reflecting its preliminary state, and balanced, achieved after using oversampling approach.Imbalanced and balanced augmented datasets outcomesThe Random Forest mannequin demonstrates distinctive efficiency on the augmented dataset with imbalance, reaching a k-fold imply accuracy of 97.41%. It additionally reveals excessive precision, recall, and F1 rating, with values of 97.23%, 97.41%, and 97.14%, respectively. The remaining fashions show imply accuracies starting from 95 to 96%. Figure 19 depicts the k-fold imply accuracy of all fashions on the augmented dataset, each imbalanced and balanced. The prime performing mannequin on the balanced augmented dataset is Random Forest, reaching a k-fold imply accuracy of 99.45% with a precision of 99.46% and 99.45% of recall and F1 rating. It is intently adopted by Balanced Bagging and XGBoost, with imply accuracies of 99.07% and 97.94%, respectively.Figure 19Models k-fold imply accuracy on augmented datasets. The dataset is categorized into two teams: imbalanced, and balanced, achieved after using oversampling approach.When evaluated on the imbalanced testing dataset, the Random Forest mannequin maintains its sturdy efficiency with a testing accuracy of 97.57%, precision of 85.94%, recall of 65.48%, and F1 rating of 74.32%. Random Forest additionally demonstrates spectacular efficiency in testing on balanced dataset, with an accuracy of 99.61%, precision of 99.23%, recall of 100%, and an F1 rating of 99.61%. The confusion matrices of the Random Forest mannequin on imbalanced and balanced augmented testing datasets are depicted in Fig. 20.Figure 20Confusion matrices of random forest on augmented datasets. Confusion matrix illustrating the efficiency of Random Forest mannequin in stroke case classification on imbalanced (left) and balanced (proper) augmented datasets, respectively.The Random Forest mannequin persistently demonstrates sturdy efficiency throughout all metrics on imbalanced and balanced augmented datasets. It achieves excessive precision scores of 97.23% on imbalanced knowledge and 99.46% on balanced knowledge, indicating its skill to determine true optimistic cases precisely. The Random Forest mannequin additionally reveals spectacular recall scores of 97.41% on imbalanced knowledge and 99.45% on balanced knowledge, highlighting its functionality to seize a excessive proportion of precise optimistic cases. In phrases of F1 rating, Random Forest Model achieves a balanced efficiency with scores of 97.14% on imbalanced knowledge and 99.45% on balanced knowledge. This signifies a harmonious stability between precision and recall, emphasizing its effectiveness in stroke prediction. The Balanced Bagging and XGBoost fashions additionally ship aggressive outcomes throughout all metrics, showcasing their potential for correct classification on balanced and imbalanced datasets. The k-fold imply precision, recall, and F1 rating of fashions on all augmented datasets are proven in Fig. 21.Figure 21Models k-fold imply efficiency metrics on augmented datasets. The dataset is categorized into two teams: imbalanced, and balanced, achieved after using oversampling approach.Results of dense stacking ensemble mannequinIn the evaluation of base mannequin outcomes, it turns into obvious that the MICE-imputed datasets produce marginally superior outcomes. Notably, the Random Forest mannequin stands out as a prime performer. Within the DSE mannequin, the Random Forest mannequin assumes the function of the meta-classifier, whereas the remaining fashions function base fashions, highlighting the synergy derived from their collective strengths. The DSE mannequin showcased outstanding efficiency on MICE-imputed datasets. For the imbalanced MICE-imputed datasets, the mannequin yielded an accuracy of 96.13%, precision of 93.26%, recall of 96.18%, and an F1 rating of 94.88%. Similarly, on balanced MICE-imputed datasets, the DSE mannequin achieved an accuracy of 96.59%, with precision of 95.25%, recall of 96.27%, and F1 rating of 95.79%. These outcomes, additionally visualized in Fig. 22, spotlight the strong efficiency of the DSE mannequin when examined on imputed datasets.Figure 22K-fold imply efficiency matrices of the proposed DSE mannequin. The MICE imputed dataset is categorized into two teams: imbalanced, and balanced, achieved after using oversampling approach.The evaluation of the AUC metric for the proposed DSE mannequin reveals compelling insights into its predictive efficiency throughout totally different datasets. On the imbalanced dataset with MICE imputation, the DSE mannequin achieves an AUC of 83.94%, showcasing its skill to discern between optimistic and damaging cases regardless of the information’s skewed distribution. Conversely, on the balanced dataset, the DSE mannequin excels even additional, attaining a powerful AUC of 98.92%. This substantial improve in AUC on the balanced dataset underscores the mannequin’s enhanced discriminatory energy and robustness when educated on a extra consultant and balanced knowledge distribution. The important efficiency enchancment achieved by the DSE mannequin on the balanced dataset in comparison with the imbalanced one is visually represented in Fig. 23.Figure 23AUC outcomes of the proposed DSE mannequin. (a) represents the AUC outcomes on imbalanced MICE imputed dataset and (b) on the balanced one.Furthermore, on the imbalanced testing dataset, the mannequin reveals a testing accuracy of 99.15%, a precision of 84.93%, a recall of 98.88%, and an F1 rating of 90.51%. For the balanced testing dataset, the mannequin offers a testing accuracy of 97.19%, precision of 96.83%, recall of 97.38%, and an F1 rating of 97.10%. The confusion matrices of the DSE mannequin on imbalanced and balanced MICE-imputed testing datasets are depicted in Fig. 24.Figure 24Confusion matrices of the proposed DSE mannequin. Proposed DSE modelâs confusion matrices on MICE imputed balanced and imbalanced datasets.The evaluation of characteristic significance is performed to establish the influential factors in stroke prediction using the proposed DSE mannequin. The evaluation of characteristic significance revealed distinct patterns between the imbalanced and balanced datasets, as visualized in Fig. 25. In each datasets, the highest three options influencing stroke prediction had been common glucose degree, age, and BMI. However, notable variations had been noticed of their relative significance. In the imbalanced dataset, these prime three options had been comparatively shut in significance, with common glucose degree barely extra influential than age and BMI. Conversely, within the balanced dataset, age emerged as an important characteristic by a big margin, adopted by common glucose degree and BMI. Additionally, the imbalanced dataset highlighted hypertension and coronary heart illness because the 4th and fifth most essential options, whereas the balanced dataset indicated that marital standing (sure and no) performed a extra important function in prediction. Interestingly, options comparable to work sort (by no means labored and youngsters) and gender (different) confirmed minimal contribution in each datasets, underscoring their restricted affect on stroke prediction outcomes.Figure 25Feature significance comparability for the proposed DSE mannequin. Feature significance graphs for imbalanced and balanced MICE-imputed datasets are displayed in (a) and (b) respectively.DiscussionWhile LR-AGD and XGBoost ship correct outcomes with excessive accuracy, they each exhibit limitations. LR-AGD performs nicely when the information is imbalanced, however its efficiency considerably decreases when the dataset is balanced, behaving in a different way and yielding decrease accuracy. Conversely, XGBoost performs exceptionally nicely on balanced datasets however struggles with imbalanced ones. Linear fashions excel in less complicated, simply separable small knowledge, whereas non-linear fashions carry out higher in advanced and equally represented variables with intricate relationships. This highlights the significance of creating an applicable mannequin to deal with these sorts of versatile dataset traits to yield optimum efficiency in stroke prediction. It is essential to contemplate the stability between precision and recall to make knowledgeable selections concerning mannequin choice. Additionally, additional analysis and growth are wanted to handle the restrictions of LR-AGD and XGBoost to reinforce their efficiency throughout numerous dataset eventualities.However, when an augmented dataset is created, incorporating essential factors considerably contributing to stroke prediction, Random Forest emerges because the superior mannequin. It persistently outperforms different fashions on each imbalanced and balanced augmented datasets with a imply accuracy of 97.409% and 99.068%, respectively. Random Forest additionally offers constant and round 95% correct outcomes for non-augmented datasets. Ultimately, the Random Forest is then used as a meta-classifier within the DSE mannequin. Tables 5 and 6 present a complete abstract of the imply accuracy of superior classification fashions and the DSE mannequin throughout all imbalanced and balanced datasets, highlighting the superior efficiency of the DSE mannequin. The DSE mannequin achieves much more superior outcomes when different fashions are integrated inside it as base fashions and Random Forest as meta-classifiers. The DSE achieves the best accuracy ranging above 96% throughout all kinds of datasets, making it essentially the most possible and strong mannequin for stroke prediction on various datasets.Table 5 Mean accuracies of fashions throughout all imbalanced datasets.Table 6 Mean accuracies of fashions throughout all balanced datasets.Additionally, Table 7 compares stroke prediction outcomes from the earlier latest research that utilized the identical dataset. This comparative evaluation gives precious insights into the top-performing DSE machine learning mannequin’s efficiency on imbalanced and balanced datasets, showcasing its respective accuracies. The desk serves as a complete reference for understanding the effectiveness of these fashions in stroke prediction. The study27 reveals that the minimal genetic folding (mGF) mannequin achieves an accuracy of 83.2% on the balanced dataset. Another study28 makes use of Logistic Regression and achieves an accuracy of 86.00%. Naive Bayes29 achieves an accuracy of 82.00%. Random Forest30 achieves a powerful accuracy of 94.46% on the imbalanced dataset, whereas Support Vector Machine31 reaches an accuracy of 95.49%. Additionally, Random Forest is studied32,33,34,36 with accuracies starting from 95.50% to 96.00%. The proposed RXLM35 mannequin achieves an accuracy of 96.34% on the balanced dataset. Okay-nearest Neighbours37 mannequin achieves accuracies of 94.00% on the balanced dataset. In this examine, the proposed DSE mannequin achieves a powerful accuracy of 96.13% on the imbalanced imputed dataset and 96.59% on the balanced dataset. The notable distinction within the DSE mannequin’s efficiency may be attributed to its distinctive skill to harness the strengths of a number of base fashions via ensemble methods. By using a strategic mixture of Voting, Blending, and Fusion ensembles, the DSE mannequin maximizes predictive accuracy by leveraging the various views and capabilities of every particular person mannequin. This subtle integration of ensemble strategies permits the DSE mannequin to outperform standalone fashions, as seen in earlier research. Overall, Table 7 gives a complete overview of stroke prediction outcomes, showcasing the efficiency of beforehand used fashions on imbalanced and balanced datasets together with the efficiency of the proposed DSE mannequin.Table 7 Comparison of stroke prediction outcomes with earlier research on similar dataset.In the area of sensible software, the DSE mannequin reveals a seamless integration right into a real-life state of affairs, as demonstrated in Fig. 26. Users, whether or not they be people involved about their well being or medical professionals, can effortlessly enter very important indicators and demographic info via a user-friendly cell or internet software. This knowledge is then securely transmitted to a cloud server the place the pre-trained DSE mannequin is deployed. The mannequin processes the enter info swiftly, with a mean prediction time of 0.095 s per topic, showcasing its effectivity. Upon completion of the prediction, outcomes are promptly relayed again to the consumer via the identical cloud server, accessible through the cell or internet app. Crucially, if a topic is predicted to be at risk of a stroke, the system affords the choice for quick on-line session with a healthcare skilled or help in finding a bodily medical service via a third-party service. This revolutionary method not solely underscores the mannequin’s applicability in real-world eventualities but additionally highlights its potential to contribute considerably to proactive healthcare administration.Figure 26Integration of proposed DSE mannequin in real-life state of affairs. The DSE mannequin rapidly processes very important indicators on a user-friendly app, providing well timed stroke risk predictions through a cloud server.
https://www.nature.com/articles/s41598-024-61665-4