Two phase feature-ranking for new soil dataset for Coxiella burnetii persistence and classification using machine learning models

The present part presents the experimental outcomes of the features-ranking models and compares their efficiency towards completely different machine learning classifiers. Various algorithms are used for classification: SVM, LDA, LR, and MLP. A ten-folds cross-validation is utilized to entry the efficiency higher and keep away from overfitting.Firstly, the options of the C. burnetii dataset are ranked using three feature-ranking models. Table 3 illustrates the rating for completely different feature-ranking algorithms, like CR, ONR, and RLF. The column “Attribute Index” shows a novel index worth for each attribute, the place pH has an index worth=1, moisture (MO)=2, soluble salts (SS)=3, and so on. The first row in columns rk(CR), rk(ONR), and rk(RLF) reveals the highest ranked attributes 8, i.e.(N), 17 ,i.e.(Cd), and 12, i.e.(Cr), respectively. The second row reveals the next high ranked attributes 4, i.e.(OM), 21, i.e.(Ok), and 21, i.e.(Ok), respectively. Similarly, the final row reveals final of the highest ranked attributes 14, i.e.(Mn), 20, i.e.(Ca), 14, i.e.(Mn), respectively. Moreover, if we assess the highest 11 options from all of the feature-ranking strategies in Table 3, the next conclusions may be drawn:


6 options i.e.{OM, N, Ni, Cr, Cd, Ok} are recurring for all rating strategies.


7 options i.e.{ Si, OM, N, Ni, Cr, Cd, Ok} are comparable between ONR and RLF.


8 options i.e.{ pH, SS, OM, N, Ni, Cr, Cd, Ok} are comparable between ONR and CR.


9 options i.e.{ OM, N, Ni, Na, Mg, Cr, Cd, Ca, Ok} are comparable between CR and RLF.

Similarly, Table 3 reveals that out of the 9 least-significant options, 6 options, i.e.({hbox {Pb},,hbox {MO},,hbox {P},,hbox {Cu},,hbox {Mn},,hbox {cy}}), are recurring amongst all rating strategies.Table 3 Attribute-ranking for C. burnetii soil attribute dataset by completely different Feature-ranking methods.Secondly, we carry out a two-phase characteristic rating to find out the contribution of every attribute towards the persistence of C. burnetii in soil. Initially, attributes are ranked based mostly on feature-ranking strategies, and then a mix of methods is utilized to calculate the weighted rating to find out the ultimate soil attribute rank. The top-ranked and least-ranked attributes are displayed individually in Tables 4 and  5. These tables present every characteristic rating technique’s scores and the ultimate combination rating of every soil attribute for the C. burnetii dataset. The combination rating is the sum of the scores of all of the attribute rating strategies. If the combination rating is on the decrease aspect, larger can be the rank of an attribute. Similarly, if the rating is on the upper aspect, the decrease can be the rank of the attribute.The first row depicts ({hbox {Ok}}) ranked 2nd by RLF and ONR, 4th by CR, and the final column reveals its combination rating of 8, which is the sum of scores of all of the attribute rating strategies, i.e.(2 + 2 + 4 = 8). The second row reveals that ({hbox {Cr}}) is ranked 1, 4, and 8 by RLF, ONR, and CR, respectively, with an combination rating of 13. Similarly, the final row reveals that ({hbox {Mg}}) is ranked 8, 12, and 9 by RLF, ONR, and CR, respectively, with an combination rating of 29. Now ({hbox {Ok}}) is the highest ranked attribute, as its combination rating, i.e.(8) is minimal, ({hbox {Cr}}) (2^{textrm{nd}}) high ranked attribute with an combination rating of 13. Similarly, the leads to the Table  5 reveals that ({hbox {Mn}}) is the least ranked attribute with an combination rating of 56, which is the sum of scores of all of the attribute rating strategies, i.e.(21+14+21=56) and, then comes ({hbox {cy}}), ({hbox {P}}), and ({hbox {Cu}}) with combination scores of 55, i.e.(20+20+15), 54, i.e.(18+16+20), and 53, i.e.(19+18+16), respectively, and so on. The stacked bar chart additional elaborates the image by displaying the rating of characteristic rating of various strategies and the combination rating for every attribute in numerous shade schemes. These charts in Figs. 3 and  4 show the highest and least ranked options, the place rk(RLF), rk(ONR), and rk(Cr) in numerous flavors of blue signify the rating rating for RLF, ONR, and CR. Similarly, the Ranking Score symbolizes the sum of scores of all of the attribute rating strategies for an attribute, which is represented in gentle blue.Table 4 List of Top Ranked Attributes for C. burnetii soil attribute dataset.
Table 5 List of Least Ranked Attributes for C. burnetii soil attribute dataset.
The top-ranked options proven in Fig. 3 replicate that Potassium (Ok) is probably the most vital attribute, the place Ok is ranked (2^{textrm{nd}}) by RLF and ONR, (4^{textrm{th}}) by CR, so its combination rating is 8, which is the sum of scores of all of the attribute rating strategies, i.e.(2+2+4=8). Similarly, the least-ranked options are proven in Fig. 4, which portrays that Mn is the least vital attribute with a rating rating of 56, which is the sum of particular person characteristic scores of 16, 21, and 20 for RLF, ONR, and CR, respectively.Figure 3Individual and Aggregate Score of Top Ranked attributes of C. burnetii Soil dataset using Feature-ranking strategies.Figure 4Individual and Aggregate Score of Least Ranked attributes of C. burnetii Soil dataset using Feature-ranking strategies.Table 6 A Comparison of outcomes from numerous Feature-ranking strategies towards completely different Machine learning classifiers using C. burnetii dataset.Thirdly, we evaluated the efficiency of those feature-ranking strategies to completely different machine learning classifiers. The results of the experiments is proven in Table 6. For each feature-ranking method, the row “rk” illustrates the rating sequence of attributes. Then the desk presents the outcomes of classifiers ( MLP, LR, LDA, and SVM) in keeping with the rating sequence of every feature-ranking mannequin. The accuracy ranges from 82.98% (SVM) to 53.19% (SVM) whereas making use of numerous rating models and classification methods. The most related characteristic for CR is ({hbox {N}}). Using this characteristic, SVM, LDA, and LR produce a classification accuracy of 63.91%, 62.9%, and 62.89% for CR, respectively. The most related characteristic for ONR is ({hbox {Cd}}), and RLF is ({hbox {Cr}}). Using Cd(ONR), LDA generated an accuracy of 59.6%, and Cr(RLF), SVM produces an accuracy of 57.45%. We can infer numerous conclusions from the evaluation of Table 6: (a) The three attribute-ranking models ship distinct rankings, which generate completely different classification outcomes. (b) The pair of ({hbox {CR}+hbox {SVM}}) offers greatest classification accuracy of (82.98%) for solely 14 soil options. (c) In precept, the order of greatest classification efficiency is bigoted: (CR+SVM,82.98%), (RLF+SVM,81.91%), (ONR+SVM,75.53%), (CR+LDA,78.7%), (RLF+LDA,79.8%), (ONR+LDA,73.4%), (CR+LR,75.53%), (RLF+LR,79.79%),(ONR+LR,72.34%), (CR+MLP,79.79%), (RLF+MLP,80.85%), (ONR+MLP,76.6%). (d) Results present that machine learning classifiers like LDA, LR, and MLP confirmed higher accuracy using the RLF feature-ranking than different characteristic rating approaches. (e) CR stands subsequent to RLF and produces higher classification outcomes for SVM than different rating strategies. (f) MLP performs higher than different machine learning classifiers for ONR characteristic rating. (g) The 14 soil attributes for which ({hbox {CR}+hbox {SVM}}) generates the very best classification accuracy are ({textual content {N,OM,SS,Ok,Na,pH,Cd,Cr,Mg,Ni,Ca,MO,Fe,Si}}). In distinction, the opposite models, like ({hbox {RLF}+hbox {SVM}}) and ({hbox {RLF}+hbox {MLP}}) make the most of 12 soil options to generate their greatest classification accuracies of 81.91% and 80.85%, respectively.Figures 5, 6 and 7 exhibit the change in accuracy of machine learning classifiers because the variety of soil options is diversified whereas making use of characteristic–rating approaches. Figure 5 reveals the accuracy of machine learning models using CR as attribute–rating method. Although the characteristic subset is comparable, LDA efficiency is healthier than different classifiers for initial-level options. However, SVM reveals glorious outcomes for mid-level options. All the classifiers show a substantial lower in accuracy for the final set of options. The outcomes present that SVM generates a classification accuracy of 82.98%, which is much better than different models. So, the general efficiency of SVM is much better than different machine learning classifiers.Figure 5Accuracy of assorted classifiers relying upon CR.Figure 6 represents accuracy curves for classification algorithms using the RLF characteristic–rating method. Although all of the classifiers present an identical development, SVM and MLP obtain a classification accuracy of 81.91% and 80.55% larger than some other classification technique. All the classifiers reveals comparable development for preliminary set of options. However, LDA and MLP appear to carry out higher than different classifiers. But, for mid-level options, LDA and MLP stand near SVM. Nevertheless, the general efficiency of SVM is healthier than different classifiers.Figure 6Accuracy of assorted classifiers relying upon RLF.Figure 7 illustrates the accuracy of classification models using ONR as an attribute-ranking method. However, all of the classifiers present an identical development for a nested subset of soil options besides MLP, which reveals a pointy improve for mid-level options. Although LR and LDA present higher outcomes for the preliminary options, SVM outperforms different classifiers for the final subset of options.Figure 7Accuracy of assorted classifiers relying upon ONR.In abstract, the outcomes suggest that: (a) 6 options that considerably contribute in the direction of the persistence of the pathogen within the atmosphere are ({hbox {Ok},,hbox {Cr},,hbox {Cd},,hbox {N},,hbox {OM},,hbox {SS}}) (b) 5 least contributing options for Coxiella are ({hbox {Mn},hbox {cy},,hbox {P},,hbox {Cu},,hbox {Pb}}). c) Feature rating using RLF generates higher outcomes for all machine learning algorithms than different feature-ranking models. (d) The classification outcomes of SVM surpass all different machine learning classifiers. (e) ({hbox {CR}+hbox {SVM}}) produces the very best accuracy of 82.98% for the preliminary 14 soil options ({textual content {N,OM,SS,Ok,Na,pH,Cd,Cr,Mg,Ni,Ca,MO,Fe,Si}}). (f) in multi-dimensional classifications, numerous machine learning models depict an identical development. Therefore, the proper classifier choice is crucial to yielding good classification leads to unseen examples.Comparision with earlier machine learning approachesAlthough some researchers utilized machine learning to categorise soil-borne pathogens like F. tularensis and the atmosphere that assist its persistence in soil, there must be extra knowledge obtainable particularly for C. burnetiia within the soil-related atmosphere, as proven in Table  7. Furthermore, our mannequin applies a two-phase feature-ranking on a novel C. burnetiia dataset, opposite to earlier works.Table 7 Comparison with earlier Machine learning approches.DiscussionsMachine learning models are used as a regular in numerous disciplines, for instance, soil classification46, medical science47, bio-informatics48, and agriculture49,50,51. Our analysis reveals that machine learning models, as a substitute of up to date statistical models present distinctive outcomes for classifying C. burnetii and understanding the pathogen’s conduct within the soil-related atmosphere.The outcomes suggest that potassium, chromium, cadmium, nitrogen, natural matter, and soluble salts are the highest 6 most vital options for the persistence of the Coxiella, as exhibited in Table 4. Previous works additionally suggest that abiotic traits equivalent to pH, natural matter, and soil vitamins, aren’t solely the driving pressure for soil bacterial community52,53,54,55 however are additionally positively linked with the persistence of soil pathogens56,57,58. Some latest studies9,22,23 additionally spotlight the importance of soil’s physiochemical traits, like natural matter, soluble salt, nitrogen, clay, potassium, cobalt, chromium, and cadmium, and so on., for the sustenance of B. anthracis, C. burnetii and F. tularensis.Our evaluation additional reveals that potassium is probably the most noteworthy characteristic for the presence of Coxiella in soil. Some improbable works9,22,23,59 show that the prevalence of assorted pathogens like B. anthracis, C. burnetii, and F. tularensis positively correlates to potassium within the soil. The subsequent important options that enhance the probability of persistence of the pathogenic micro organism are chromium, cadmium, nitrogen, natural matter, and soluble salts. Studies23,56,57,58 within the latest previous point out that the presence of natural matter and chromium is useful for persistence of pathogens within the soil. Another study60 reveals that nitrogen is crucial for the sustenance of pathogens inside their plant and animal hosts. A study22,61 means that cadmium, nitrogen, soluble salts and natural matter positively correlate with the prevalence of F. tularensis in soil. The prevalence of B. anthracis can also be related to the presence of natural matter, chromium, and potassium in soil23. Recent works9,59 spotlight that soluble salts is positively correlated with the persences of C. burnetii and F. tularensis. Similarly, a work61 offers proof that nitrogen and natural matter are useful within the persistance of C. burnetii and one other analysis additionally illustrates that nitrogen and natural matter are additionally positively associated to the sustenance of a nitrogen-fixing micro organism referred to as A. brasilense62.The remaining contributing options from Table 4 are sodium, pH, nickel, and magnesium. Previous researches53,54,55 present that soil texture, pH, and vitamins are important for bacterial communities. Our outcomes conform with a latest study23 that reveals that options like magnesium, potassium, and sodium are positively correlated to C. burnetii in soil-related environments. Another work9 additionally reveals a considerable distinction between Coxiella adverse and optimistic websites on the subject of magnesium and sodium. A study30 additionally reveals that F. tularensis has a optimistic affinity with souble salts, nickel, and pH for its existence in soil. Another research59 reveals that soluble salts and nickel positively contribute in the direction of the presence of F. tularensis. Magnesium performs a considerable half within the persistence of microbes throughout hunger and chilly shocks63. A work25 illustrates magnesium, sodium, potassium, and sulfate are conducive to F. tularensis development in soil and water.Our research additionally depicts that silt, moisture, and cobalt fall within the center. Previous analysis reveals that silt possesses substantial natural matter as a result of rise in floor space in comparison with the sandy portion, which can increase the potential for the prevalence of pathogens64. Another research23 reveals that the persistence of C. burnetii is related to larger focus of cobalt within the enviornment. A study22 reveals that the persistence of F. tularensis is positively correlated to the presence of silt in soil. Another work65 proposes that F. tularensis has an ideal affinity to moisture and low temperature.Our machine learning evaluation reveals that the least contributing seven options are manganese, clay, phosphorous, copper, lead, sand, and calcium as proven in Table 5. A latest research9 additionally substantiates our viewpoint by exhibiting no vital distinction between Coxiella adverse and optimistic websites relating to manganese, phosphorous, clay, lead, copper, and sand within the soil. A study22 additionally reveals that manganese, phosphorous, calcium, copper, and sand don’t present any optimistic affinity with F. tularensis in soil. Similar research59 additionally reveals that clay, phosphorous, copper, lead, sand, and calcium aren’t positively correlated with F. tularensis. Some suggest66 that in scorching and dry climate, excessive manganese contents are seen in B. pseudomallei optimistic websites as appose to adverse websites. However, others67 imagine that the cardio heterotrophic inhabitants of microbes may be very prone to completely different minerals, like cadmium, nickel, manganese, mercury, chromium, copper, and zinc. An evaluation additionally reveals that manganese and zinc are important for organic processes, and they exist as protein elements in lots of species67,68. Some works suggest that zinc helps in a number of mobile capabilities, like pH regulation, metabolism, bacterial gene expression, DNA replication, glycolysis, synthesis of Amino acids, and processes as a cofactor of microbial virulence69. However, the excess quantity of zinc could cause toxicity; thus, these microbes possess a light construction to keep up zinc’s equilibrium for executing essential mobile capabilities and abstain from the damages it might cause70.Classification outcomes of C. burnetii in soil using completely different machine learning methods reveal that SVM surpasses all different machine learning models by producing an accuracy of 82.98% using the preliminary 14 top-ranked options.

Recommended For You