The goal of this analysis is to develop exact and reliable machine learning fashions for the prediction of SW (Water Saturation) utilizing three DL and three SL techniques: LSTM, GRU, RNN, SVM, KNN and DT. These fashions had been educated on an in depth dataset comprising varied varieties of log knowledge. The findings of our investigation illustrate the efficacy of data-driven machine learning fashions in SW prediction, underscoring their potential for a variety of sensible purposes.When evaluating and evaluating algorithms, researchers should take note of a number of essential components. Accuracy and disparities in prediction are among the many most vital issues. To consider these components, researchers can make the most of varied standards, together with Eqs. 1â6. The Mean Percentage Error (MPE) calculates the common distinction between predicted and precise values as a share, whereas the Absolute Mean Percentage Error (AMPE) measures absolutely the distinction between them. Additionally, the Standard Deviation (SD) determines the variability of knowledge factors across the imply. Moreover, the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) quantify the imply and root imply squared variations between predicted and precise values, respectively. Lastly, the R2 metric assesses the fraction of range in the reliant variable that may be accounted for by the autonomous variable.$$textual content{MPE}=frac{{{sum }_{textual content{i}=1}^{textual content{n}}(frac{{SWE}_{(textual content{Meas}.)}-{SWE}_{(textual content{Pred}.)}}{{SWE}_{(textual content{Meas}.)}}textual content{x }100)}_{textual content{i}}}{textual content{n}}$$
(1)
$$textual content{AMPE}=frac{{sum }_{textual content{i}=1}^{textual content{n}}left|{(frac{{SWE}_{(textual content{Meas}.)}-{SWE}_{(textual content{Pred}.)}}{{SWE}_{(textual content{Meas}.)}}textual content{x }100)}_{textual content{i}}proper|}{textual content{n}}$$
(2)
$$textual content{SD}=sqrt{frac{{sum }_{textual content{i}=1}^{textual content{n}}{({left(frac{1}{textual content{n}}sum_{textual content{i}=1}^{textual content{n}}left({{SWE}_{textual content{Meas}.}}_{textual content{i}}-{{SWE}_{textual content{Pred}.}}_{textual content{i}}proper)proper)}_{textual content{i}}-(frac{1}{textual content{n}}sum_{textual content{i}=1}^{textual content{n}}left({{SWE}_{textual content{Meas}.}}_{textual content{i}}-{{SWE}_{textual content{Pred}.}}_{textual content{i}}proper))textual content{imean})}^{2}}{textual content{n}-1}}$$
(3)
$$textual content{MSE}=frac{1}{textual content{n}}sum_{textual content{i}=1}^{textual content{n}}{left({{SWE}_{textual content{Meas}.}}_{textual content{i}}-{{SWE}_{textual content{Pred}.}}_{textual content{i}}proper)}^{2}$$
(4)
$$textual content{RMSE}=sqrt{frac{1}{textual content{n}}sum_{textual content{i}=1}^{textual content{n}}{left({{SWE}_{textual content{Meas}.}}_{textual content{i}}-{{SWE}_{textual content{Pred}.}}_{textual content{i}}proper)}^{2}}$$
(5)
$${textual content{R}}^{2}=1-frac{sum_{textual content{i}=1}^{textual content{N}}{({{SWE}_{textual content{Pred}.}}_{textual content{i}}-{{SWE}_{textual content{Meas}.}}_{textual content{i}})}^{2}}{sum_{textual content{i}=1}^{textual content{N}}{({{SWE}_{textual content{Pred}.}}_{textual content{i}}-frac{{sum }_{textual content{I}=1}^{textual content{n}}{{SWE}_{textual content{Meas}.}}_{textual content{i}}}{textual content{n}})}^{2}}$$
(6)
In order to forecast SW, three DL and three SL techniques: LSTM, GRU, RNN, SVM, KNN and DT, had been used in this research. Each algorithm underwent particular person coaching and testing processes, adopted by impartial experiments. To make sure the accuracy of the predictions, the dataset was rigorously divided into three subsets. The coaching subset accounted for 70% of the info information, whereas 30% was allotted for impartial testing.Choosing probably the most appropriate algorithm for a selected process is a vital endeavor inside the realm of knowledge evaluation and machine learning. Therefore, this analysis aimed to assess and examine the efficiency of a number of LSTM, GRU, and RNN algorithms in predicting SW. The outcomes of these algorithms, using the practice knowledge values, in addition to the check, have been meticulously documented and offered in Table 2. By analyzing the outcomes, researchers can achieve insights into the effectiveness of every algorithm and make knowledgeable selections about their implementation in sensible purposes.Table 2 Determination of error parameters for prediction of SW based mostly on three DL and three SL techniques: LSTM, GRU, RNN, SVM, KNN and DT.The outcomes from the check knowledge are offered in Table 2, highlighting the superb efficiency of the RMSE, MPE and AMPE metrics for the GRU algorithm, with values of 0.0198,âââ0.1492 and a pair of.0320, respectively. Similarly, for the LSTM algorithm, the corresponding values are 0.0284,âââ0.1388 and three.1136, whereas for the RNN algorithm, they’re 0.0399,âââ0.0201 and 4.0613, respectively. For SVM, KNN and DT these metrics are contains: 0.0599,âââ0.1664 and 6.1642; 0.7873, 0.0997 and seven.4575; 0.7289,âââ0.1758 and eight.1936. The outcomes present the GRU mannequin has excessive accuracy than different algorithms.The R2 parameter is a vital statistical measure for evaluating and evaluating completely different fashions. It assesses the adequacy of a mannequin by quantifying the quantity of variation in the end result variable that may be clarified by the explanatory variables. In this research, Fig. 5 illustrates cross plots for predicting SW values based mostly on the practice and check knowledge, demonstrating considerably increased prediction accuracy in contrast to the opposite evaluated fashions. Additionally, Fig. 5 confirms that the RGU mannequin displays superior prediction accuracy in contrast to the LSTM and RNN fashions.Figure 5Cross plot for predicting SW utilizing three DL algorithms corresponding to RGU, LSTM and RNN for check knowledge.To assess the precision of the GRU mannequin, the outcomes offered in Table 2 and Fig. 5 had been rigorously analyzed for the practice and check knowledge. The evaluation revealed that the GRU algorithm achieved low errors for SW, with RMSE values of 0.0198 and R-square values of 0.9973. The R2 values supplied function quantitative metrics assessing the predictive prowess of ML fashions. The R2, denoting the coefficient of determination, gauges the extent to which the variance in the dependent variable might be foreseen from the impartial variable(s), primarily showcasing how properly the mannequin aligns with noticed knowledge factors. The R2 values for the GRU, LSTM, RNN, SVM, KNN and DT fashions stand at 0.9973, 0.9725, 0.9701, 0.8050, 0.7873 and 0.7289 respectively, reflecting their respective accuracy and reliability in predicting SW ranges. Figure 5 exhibits the cross plot for predicting SW utilizing three DL algorithms corresponding to RGU, LSTM, and RNN for check knowledge. The GRU mannequin’s notably excessive R2 of 0.9973 underscores its distinctive correlation between predicted and noticed SW values, implying that almost 99.73% of SW knowledge variance might be elucidated by its predictions, showcasing its precision and reliability in SW prediction duties. Comparatively, the LSTM and RNN fashions, with R2 values of 0.9725 and 0.9701 respectively, additionally exhibit sturdy predictive capabilities, albeit barely decrease than the GRU mannequin. These findings underscore the GRU mannequin’s superiority in SW prediction, attributed to its adeptness in capturing intricate temporal dependencies inside SW knowledge, thereby yielding extra correct predictions.Figure 6 supplies a visible illustration of the calculation error for the check knowledge, illustrating the error distribution for predicting SW utilizing three DL algorithms (GRU, LSTM, and RNN). The plotted coordinates in the determine depict the error vary for every algorithm. For the GRU algorithm, the error vary is noticed to be betweenâââ0.0103 and 0.0727. This signifies that the predictions made by the GRU mannequin for the check knowledge exhibit a comparatively small deviation from the precise SW values inside this vary. In distinction, the LSTM algorithm demonstrates a barely wider error vary, ranging fromâââ0.146 to 0.215. This means that the predictions generated by the LSTM mannequin for the check knowledge exhibit a considerably increased variability and should deviate from the precise SW values inside this broader vary.Figure 6Error factors for predicting SW utilizing three DL and three SL algorithms corresponding to GRU, LSTM, RNN, SVM, KNN and DT for check knowledge.Similarly, the RNN algorithm displays an error vary betweenâââ0.222 and 0.283. This signifies that the predictions made by the RNN mannequin for the check knowledge present a bigger unfold and have the potential to deviate extra considerably from the precise SW values inside this vary. By visually evaluating the error ranges for the three DL algorithms, it turns into obvious that the GRU algorithm achieves a narrower vary and thus demonstrates higher precision and accuracy in predicting SW for the check knowledge. Conversely, the LSTM and RNN algorithms exhibit broader error ranges, indicating a better diploma of variability in their predictions for the identical dataset. These findings additional help the conclusion that the GRU algorithm outperforms the LSTM and RNN algorithms in phrases of SW prediction accuracy, because it constantly produces predictions with smaller errors and tighter error bounds.Figure 7 presents an error histogram plot, depicting the prediction errors for SW utilizing three DL and three SL algorithms corresponding to GRU, LSTM, RNN, SVM, KNN and DT. Each histogram represents the distribution of prediction errors for every algorithm, displaying a traditional distribution centered round zero with a comparatively slim unfold and no noticeable constructive or unfavourable bias. This plot allows a complete evaluation of the algorithms’ efficiency and aids in figuring out the very best algorithm with a traditional error distribution. Upon cautious investigation, it turns into evident that the GRU algorithm displays a superior regular distribution of knowledge in contrast to the opposite algorithms. The GRU algorithm’s efficiency is characterised by a extra correct normal deviation and a narrower unfold of prediction errors. This signifies that the GRU algorithm constantly produces extra exact and dependable predictions for SW. By evaluating the outcomes offered in Table 2 and analyzing the error histogram plot in Fig. 7, we are able to conclude that the efficiency accuracy of the algorithms might be ranked as follows: GRUâ>âLSTMâ>âRNNâ>âSVMâ>âKNNâ>âDT.Figure 7Histogram plot for SW prediction utilizing three DL and three SL algorithms corresponding to GRU, LSTM, RNN, SVM, KNN and DT.Figure 8 illustrates the error price of the three DL and three SL algorithms corresponding to GRU, LSTM, RNN, SVM, KNN and DT as a perform of iteration for SW prediction. The findings of this research point out that the GRU and LSTM algorithms initially exhibit increased error values that progressively lower over time. However, this sample isn’t noticed in the RNN algorithm. Upon analyzing the determine, it turns into evident that the LSTM algorithm achieves increased accuracy than the opposite algorithms firstly of the iteration. At the edge of 10 iterations, the LSTM algorithm surpasses the GRU algorithm with a decrease error worth. However, in the next iterations, particularly at iteration 31, the GRU algorithm outperforms the LSTM algorithm with superior efficiency accuracy. In distinction, the RNN algorithm exhibits a constant lower in efficiency accuracy from the beginning to the tip of the iterations, with out displaying important fluctuations. When specializing in the zoomed-in portion of the determine, particularly repetitions 85â100, the continued efficiency tendencies of these algorithms turn out to be extra obvious. It is obvious from the evaluation that the GRU algorithm constantly outperforms the opposite algorithms in phrases of efficiency accuracy. The LSTM algorithm follows, with a lower in accuracy over the iterations. On the opposite hand, the RNN algorithm displays a declining efficiency accuracy with none notable adjustments or fluctuations. These findings emphasize the prevalence of the GRU algorithm in phrases of efficiency accuracy when put next to the LSTM and RNN algorithms. The GRU algorithm constantly maintains a better stage of accuracy all through the iterations, whereas the LSTM and RNN algorithms expertise fluctuations and reducing accuracy over time.Figure 8Iteration plot for SW prediction utilizing three DL and three SL algorithms corresponding to GRU, LSTM, RNN, SVM, KNN and DT.Pearson’s coefficient (R) is a extensively used technique for assessing the relative significance of input-independent variables in contrast to output-dependent variables, corresponding to SWL. The coefficient ranges betweenâââ1 andâ+â1 and represents the power and course of the correlation. A price ofâ+â1 signifies a powerful constructive correlation, -1 signifies a powerful unfavourable correlation, and a worth shut to 0 signifies no correlation. Equation 7 illustrates the calculation of Pearson’s correlation coefficient, which is a statistical measure of the linear relationship between two variables. It permits researchers to quantify the extent to which adjustments in one variable are related to adjustments in one other variable. By making use of Pearson’s coefficient, researchers can decide the extent of affect that input-independent variables have on the output-dependent variable, SWL.$$R=frac{sum_{i=1}^{n}({Z}_{i}-overline{Z })({Q}_{i}-overline{Q })}{sqrt{{sum }_{i=1}^{n}{({Z}_{i}-overline{Z })}^{2}}sqrt{{sum }_{i=1}^{n}{({Q}_{i}-overline{Q })}^{2}}}$$
(7)
A coefficient ofâ+â1 signifies an ideal constructive correlation, suggesting that the input-independent variables have the best constructive affect on the output-dependent variable. Conversely, a coefficient ofâââ1 represents an ideal unfavourable correlation, indicating that the input-independent variables have the best absolute affect on the output-dependent variable. When the coefficient is shut to 0, it means that there is no such thing as a important correlation between the variables, indicating that adjustments in the input-independent variables do not need a considerable impact on the output-dependent variable. Pearson’s correlation coefficient is a priceless instrument for assessing the connection between variables and understanding their affect. It supplies researchers with a quantitative measure to decide the relative significance of input-independent variables in contrast to the output-dependent variable, SWL.By warmth map exhibits Fig. 9, a comparability of Pearson correlation coefficients might be made to achieve insights into the connection between enter variables and SW. The outcomes reveal a number of important correlations between the variables. Negative correlations are noticed with URAN and DEPTH, indicating an inverse relationship with SW. This means that increased values of URAN and DEPTH are related to decrease SW values. On the opposite hand, constructive correlations are noticed with CGR, DT, NPHI, POTA, THOR, and PEF. These variables present a direct relationship with SW, that means that increased values of CGR, DT, NPHI, POTA, THOR, and PEF are related to increased SW values. The comparability of Pearson correlation coefficients supplies priceless insights into the connection between enter variables and SW.Figure 9Heat map plot for SW prediction utilizing three DL and three SL algorithms corresponding to GRU, LSTM, RNN, SVM, KNN and DT.These findings might be utilized to develop predictive fashions of SW based mostly on the enter variables. By incorporating the correlations into the fashions, researchers can improve their accuracy and reliability in predicting SW values. The expression of the relationships between the enter variables and SW in the shape of Eq. 8 permits for quantitative evaluation of the info. This equation supplies a mathematical illustration of the correlations, enabling researchers to quantitatively consider the affect of the enter variables on SW.$$SWE=propto left(textual content{CGR},textual content{ DT},textual content{ NPHI},textual content{ POTA},textual content{ THOR},textual content{ PEF}proper) and SWE=propto frac{1}{left(URAN, DEPTHright)}$$
(8)
https://www.nature.com/articles/s41598-024-63168-8