<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xml:lang="en" article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
<?release-delay 0|0?>
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">OL</journal-id>
<journal-title-group>
<journal-title>Oncology Letters</journal-title>
</journal-title-group>
<issn pub-type="ppub">1792-1074</issn>
<issn pub-type="epub">1792-1082</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3892/ol.2019.10068</article-id>
<article-id pub-id-type="publisher-id">OL-0-0-10068</article-id>
<article-categories>
<subj-group>
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Development of QSAR machine learning-based models to forecast the effect of substances on malignant melanoma cells</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Ancuceanu</surname><given-names>Robert</given-names></name>
<xref rid="af1-ol-0-0-10068" ref-type="aff">1</xref></contrib>
<contrib contrib-type="author"><name><surname>Dinu</surname><given-names>Mihaela</given-names></name>
<xref rid="af1-ol-0-0-10068" ref-type="aff">1</xref>
<xref rid="c1-ol-0-0-10068" ref-type="corresp"/></contrib>
<contrib contrib-type="author"><name><surname>Neaga</surname><given-names>Iana</given-names></name>
<xref rid="af2-ol-0-0-10068" ref-type="aff">2</xref></contrib>
<contrib contrib-type="author"><name><surname>Laszlo</surname><given-names>Fekete Gyula</given-names></name>
<xref rid="af3-ol-0-0-10068" ref-type="aff">3</xref></contrib>
<contrib contrib-type="author"><name><surname>Boda</surname><given-names>Daniel</given-names></name>
<xref rid="af4-ol-0-0-10068" ref-type="aff">4</xref></contrib>
</contrib-group>
<aff id="af1-ol-0-0-10068"><label>1</label>Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, &#x2018;Carol Davila&#x2019; University of Medicine and Pharmacy, 020956 Bucharest, Romania</aff>
<aff id="af2-ol-0-0-10068"><label>2</label>Department of Public Health and Management, Faculty of Medicine, &#x2018;Carol Davila&#x2019; University of Medicine and Pharmacy, 050463 Bucharest, Romania</aff>
<aff id="af3-ol-0-0-10068"><label>3</label>Department of Dermatology, University of Medicine and Pharmacy of T&#x00E2;rgu Mure&#x015F;, 540142 T&#x00E2;rgu Mure&#x015F;, Romania</aff>
<aff id="af4-ol-0-0-10068"><label>4</label>Dermatology Research Laboratory, &#x2018;Carol Davila&#x2019; University of Medicine and Pharmacy, 050474 Bucharest, Romania</aff>
<author-notes>
<corresp id="c1-ol-0-0-10068"><italic>Correspondence to</italic>: Professor Mihaela Dinu, Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, &#x2018;Carol Davila&#x2019; University of Medicine and Pharmacy, 6 Traian Vuia Road, 020956 Bucharest, Romania, E-mail: <email>mihaela.dinu@umfcd.ro</email></corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>05</month>
<year>2019</year></pub-date>
<pub-date pub-type="epub">
<day>25</day>
<month>02</month>
<year>2019</year></pub-date>
<volume>17</volume>
<issue>5</issue>
<fpage>4188</fpage>
<lpage>4196</lpage>
<history>
<date date-type="received"><day>21</day><month>09</month><year>2018</year></date>
<date date-type="accepted"><day>15</day><month>11</month><year>2018</year></date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; Ancuceanu et al.</copyright-statement>
<copyright-year>2019</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivs License</ext-link>, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.</license-p></license>
</permissions>
<abstract>
<p>SK-MEL-5 is a human melanoma cell line that has been used in various studies to explore new therapies against melanoma in different <italic>in vitro</italic> experiments. Based on this study we report on the development of quantitative structure-activity relationship (QSAR) models able to predict the cytotoxic effect of diverse chemical compounds on this cancer cell line. The dataset of cytotoxic and inactive compounds were downloaded from the PubChem database. It contains the data for all chemical compounds for which cytotoxicity results expressed by GI<sub>50</sub> was recorded. In total 13 blocks of molecular descriptors were computed and used, after appropriate pre-processing in building QSAR models with four machine learning classifiers: Random forest (RF), gradient boosting, support vector machine and random k-nearest neighbors. Among the 186 models reported none had a positive predictive value (PPV) higher than 0.90 in both nested cross-validation and on an external dataset testing, but 7 models had a PPV higher than 0.85 in both evaluations, all seven using the RFs algorithm as a classifier, and topological descriptors, information indices, 2D-autocorrelation descriptors, P-VSA-like descriptors, and edge-adjacency descriptors as sets of features used for classification. The y-scrambling test was associated with considerably worse performance (confirming the non-random character of the models) and the applicability domain was assessed through three different methods.</p>
</abstract>
<kwd-group>
<kwd>QSAR</kwd>
<kwd>melanoma</kwd>
<kwd>SK-MEL-5</kwd>
<kwd>gradient boosting</kwd>
<kwd>k-nearest neighbors</kwd>
<kwd>random forests</kwd>
<kwd>support vector machines</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Quantitative structure-activity relationship (QSAR) models are mathematical tools used to predict the physical, chemical or biological characteristics of chemical substances from their chemical structure, as expressed through a variety of &#x2018;chemical descriptors&#x2019; (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>). In the famous statistical aphorism of George Box, &#x2018;all models are wrong but some are useful&#x2019; (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>); QSAR models might be imperfect, but they have proven useful in a plethora of applications (<xref rid="b3-ol-0-0-10068" ref-type="bibr">3</xref>), from drug design (being frequently used for virtual screening, as well as lead optimization) (<xref rid="b4-ol-0-0-10068" ref-type="bibr">4</xref>) to toxicological predictions (being used to predict toxicity for a large number of substances for which wet lab experiments have not yet been performed and may be unlikely to be performed in the near- or mid-term future (<xref rid="b5-ol-0-0-10068" ref-type="bibr">5</xref>), or from protein binding (<xref rid="b6-ol-0-0-10068" ref-type="bibr">6</xref>) to cytochrome P450 interaction forecasts (<xref rid="b7-ol-0-0-10068" ref-type="bibr">7</xref>).</p>
<p>Melanoma is considered the most threatening form of skin neoplasm, having fast progression and metastasizing, as well as a high burden of death, particularly if detected late (<xref rid="b8-ol-0-0-10068" ref-type="bibr">8</xref>). Although an important number of therapies have recently been approved for advanced stage melanoma, the disease is far from being vanquished, resistance development through mutations or alternative signaling pathways, cancer heterogeneity and serious adverse events limiting the efficacy and potential benefits of the newer treatments, at least in a proportion of the patients (<xref rid="b9-ol-0-0-10068" ref-type="bibr">9</xref>,<xref rid="b10-ol-0-0-10068" ref-type="bibr">10</xref>). Therefore, although therapeutic options are now better for patients with advanced melanoma than they were a decade ago, there is still a need for developing new drugs targeting melanoma, and a variety of approaches are still explored, from evaluating new targets (<xref rid="b11-ol-0-0-10068" ref-type="bibr">11</xref>) to exploring new delivery systems for old compounds (<xref rid="b12-ol-0-0-10068" ref-type="bibr">12</xref>). SK-MEL-5 is a human melanoma cell line derived from a metastatic axillary node of a young female patient, and is characterized by a high level of expression of the V600E mutation of B-Raf, of the wild-type N-Ras (<xref rid="b13-ol-0-0-10068" ref-type="bibr">13</xref>), as well as by relatively high levels of the ABCB1 transcript (<xref rid="b14-ol-0-0-10068" ref-type="bibr">14</xref>). This is unlike SK-MEL-2 melanoma cell line, which has wild-type B-Raf, but normal N-Ras (<xref rid="b11-ol-0-0-10068" ref-type="bibr">11</xref>). It has been used in various studies to explore new therapies against melanoma in various <italic>in vitro</italic> experiments (<xref rid="b15-ol-0-0-10068" ref-type="bibr">15</xref>&#x2013;<xref rid="b17-ol-0-0-10068" ref-type="bibr">17</xref>).</p>
<p>In the present study, we report on our attempts to develop QSAR models, able to forecast the cytotoxic effects of different chemical compounds on the SK-MEL-5 melanoma cell line, using the data available on PubChem. Such data are derived from different laboratories, have been generated at different times, most likely with different reagents and laboratory equipment; moreover, whereas most QSAR studies are focused on a well-defined biological target, the cytotoxicity data are inherently more heterogeneous, as different molecules may induce cytotoxicity through a variety of biochemical pathways. Thus, it is to be expected that QSAR modelling of such data is more challenging than for compounds targeting specific proteins or other unambiguous cell targets. Kalliokoski <italic>et al</italic> (<xref rid="b18-ol-0-0-10068" ref-type="bibr">18</xref>), based on a data set filtered using certain validity criteria have shown that the standard deviation for IC<sub>50</sub> is only approximately 25&#x0025; higher than that of ki; we have used GI<sub>50</sub>, which is similar to IC<sub>50</sub>, in our models, as ki data are not available for cytotoxicity measurements on cultured cell lines (ki is applicable to distinct protein targets). Because of these considerations, as well as due to the relatively large structural diversity of the dataset, we used a binary classification approach (not regression models) (<xref rid="b19-ol-0-0-10068" ref-type="bibr">19</xref>) and have focused on 4 machine learning techniques extensively made use of in the area of data prediction: Random forest (RF), gradient boosting (BST), support vector machine (SVM) and k-nearest neighbor (KNN).</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title/>
<sec>
<title>Dataset</title>
<p>The dataset of cytotoxic and inactive compounds on the SK-MEL-5 cell line was downloaded from the PubChem data base (<uri xlink:href="https://pubchem.ncbi.nlm.nih.gov">https://pubchem.ncbi.nlm.nih.gov</uri>) in June 2017. We have retained the data for all chemical compounds for which cytotoxicity results expressed by GI<sub>50</sub> was recorded. Other assessment criteria for the same cell line (e.g., LC<sub>50</sub> or ED<sub>50</sub>) were not preferred and selected because the number of records was much lower for these measures (35 observations for the former, 138 for the latter). We downloaded the PubChem canonical SMILES and used ChemAxon Standardizer v. 18.8.0 (ChemAxon, Budapest, Hungary) for the standardization of the molecules. Duplicates were removed in two steps: First, we detected duplicates in R, based on the canonical SMILES, and replacing the GI<sub>50</sub> with the mean value of the duplicates. This procedure identified most of the duplicates. In a second step we used the ISIDA/Duplicates (<uri xlink:href="http://infochim.u-strasbg.fr">http://infochim.u-strasbg.fr</uri>; University of Strasbourg, France) software following the structure standardization and this detected an additional duplicate. Standardized SMILES were converted to 2D chemical structures using Discovery Studio Visualizer v16.1.0.15350 (Dassault Syst&#x00E8;mes BIOVIA, San Diego, CA, USA). We defined a compound as &#x2018;active&#x2019; if the GI<sub>50</sub> was less than 1 &#x00B5;M and &#x2018;inactive&#x2019; if the GI<sub>50</sub> was higher than the 1 &#x00B5;M threshold. We started with a number of 445 observations and, following removal of duplicates ended up with 422 observations, of which 174 labelled as &#x2018;active&#x2019; and 248 as &#x2018;inactive&#x2019;; the ratio of inactive:active compounds was ~1.42. Having a balanced data set is important for a good performance of machine learning algorithms, especially when the target class is underrepresented (<xref rid="b20-ol-0-0-10068" ref-type="bibr">20</xref>). We therefore also assessed the effect of balancing the data through over-, under-, and a combination of over- and under-sampling, but the benefit was in most cases rather limited, if at all. We randomly divided the data set in a training (learning) set (316 compounds) and a testing set (106 compounds), using the rminer package of the R statistical tool (<xref rid="b21-ol-0-0-10068" ref-type="bibr">21</xref>).</p>
</sec>
<sec>
<title>Descriptors</title>
<p>Thirteen blocks of molecular descriptors were computed with the Dragon 7 program (version 7.0, <uri xlink:href="https://chm.kode-solutions.net">https://chm.kode-solutions.net</uri>; Kode SRL, Milano, Italy): Constitutional descriptors (n=47), ring descriptors (n=32), topological indices (n=75), walk and path counts (n=46), information indices (n=50), 2D matrix-based descriptors (n=607), 2D-autocorrelations (n=213), Burden eigenvalues (n=96), P-VSA-like descriptors (n=55), ETA indices (n=23), Edge adjacency indices (n=324), and molecular properties (n=20). We have also used the whole set of 1D and 2D descriptors (264 descriptors after the removal of constant, quasi-constant and highly correlated variables), in order to assess whether models based on a larger pool of descriptors have better performance with the chosen classifiers than models based on a narrow and well-defined family of descriptors. Thus, the total number of descriptor blocks used for building classification models was 13. Because the models based on the molecular properties had poor performance we did not include the results of those models here.</p>
</sec>
<sec>
<title>Pre-processing and feature selection</title>
<p>We generated distinct QSAR models with each of the 15 blocks of descriptors and pre-processed the data using R, v. 3.4.4 (<xref rid="b22-ol-0-0-10068" ref-type="bibr">22</xref>), and &#x2018;mlr&#x2019; package, v. 2.12.1 (<xref rid="b23-ol-0-0-10068" ref-type="bibr">23</xref>). For this purpose, within each block of descriptors we removed variables with constant or near constant values (using a threshold value of 0.1&#x0025;, i.e., features for which less than 0.1&#x0025; differed from their mode value were removed). Features containing missing values were also removed, because it is likely that for virtual screening purposes models built with such features will not be applicable for a part of the new compounds. Features highly correlated were also removed, using a threshold value of the coefficient correlation of 0.80. For each subset, after such pre-processing we selected maximum 7 features using two methods: i) RF importance (&#x2018;random forest&#x2019; R package) (<xref rid="b24-ol-0-0-10068" ref-type="bibr">24</xref>); and ii) symmetrical uncertainty (&#x2018;FSelector&#x2019; R package) (<xref rid="b25-ol-0-0-10068" ref-type="bibr">25</xref>).</p>
</sec>
<sec>
<title>Classifiers</title>
<p>We made use of four machine learning algorithms to build classification models able to predict with reasonable accuracy the effect of substances against the SK-MEL-5 melanoma cell line: RF, BST, SVM, and KNN.</p>
<p>RFs, first proposed by Ho in 1995 (<xref rid="b26-ol-0-0-10068" ref-type="bibr">26</xref>) and improved by Breiman in 2001 (<xref rid="b27-ol-0-0-10068" ref-type="bibr">27</xref>) use a large number of decision trees (hence the name, &#x2018;forests&#x2019;), which are aggregated through bootstrap (bagging), and prediction for unseen samples are made through averaging or a majority vote. It has been described as &#x2018;among the most accurate methods&#x2019; in the field of QSAR (<xref rid="b28-ol-0-0-10068" ref-type="bibr">28</xref>). It is implemented in the R package &#x2018;random forest&#x2019; (<xref rid="b24-ol-0-0-10068" ref-type="bibr">24</xref>).</p>
<p>Gradient boosting machines (GBMs) represent an algorithm able to combine weak learners in a strong one, building, in an iterative manner, additional base-learners that have a maximal correlation with the negative slope of a cost function, a variety of such functions being available (<xref rid="b29-ol-0-0-10068" ref-type="bibr">29</xref>). In QSAR models GBMs have shown good results with respect to performance of prediction, speed and robustness (<xref rid="b30-ol-0-0-10068" ref-type="bibr">30</xref>). The algorithm was run under &#x2018;mlr&#x2019; R package based on the implementation carried out in &#x2018;bst&#x2019; (<xref rid="b31-ol-0-0-10068" ref-type="bibr">31</xref>) and &#x2018;rpart&#x2019; (<xref rid="b32-ol-0-0-10068" ref-type="bibr">32</xref>) R packages.</p>
<p>Support vector machines (SVMs), proposed for the first time and developed by Vladmir Vapnik, makes use of a hyperplane separating the data from the variable space into classes. Variables are first mapped in a high-dimensional space through a variety of kernel functions, then the algorithm identifies in this high-dimensional space the maximal margin hyperplane, thus separating the compounds in classes (<xref rid="b33-ol-0-0-10068" ref-type="bibr">33</xref>). Its chief advantage consists in the fact that it makes use of the structure risk minimization (SRM) principle, which is more efficient than the conventional empirical risk minimization (ERM) (<xref rid="b34-ol-0-0-10068" ref-type="bibr">34</xref>). We used the implementation of the algorithm available in the &#x2018;e1071&#x2019; R package (<xref rid="b35-ol-0-0-10068" ref-type="bibr">35</xref>).</p>
<p>KNN is a classification method, in which the separation of variables in classes is performed using the nearest training observations from the variable space (<xref rid="b36-ol-0-0-10068" ref-type="bibr">36</xref>), more precisely, a test instance is classified with the help of majority decision using the data of its KNN, as computed from the learning set (<xref rid="b37-ol-0-0-10068" ref-type="bibr">37</xref>). The algorithm was run under &#x2018;mlr&#x2019; R package based on the implementation carried out in the &#x2018;rknn&#x2019; R package (<xref rid="b38-ol-0-0-10068" ref-type="bibr">38</xref>).</p>
</sec>
<sec>
<title>Performance measures and model validation</title>
<p>A nested (double) cross validation method was used to tune the hyper-parameters for each algorithm and to assess the performance and robustness of the model thus developed (guiding the decision by the best performance in terms of Cohen&#x0027;s kappa). This is considered the most appropriate procedure for cross-validation, the data being partitioned into a learning subset and a test subset, the learning subset being used in the internal loop, for the model building and selection, whereas the test subset is being used for the assessment of the performance of the model picked in the inner loop. The inner loop used a 5-fold cross-validation, whereas the outer loop used a 10-fold cross-validation. The nested cross-validation method was performed on the 316 compounds constituting the initial training set (which was thus, successively divided in training and test subsets). To externally assess the reliability of the model performance on data unseen by the model, we used the 106 compounds of the (initial) test set.</p>
<p>The purpose of developing the models was to identify compounds with a high likelihood of being active; in other words, we were not equally interested in classifying both positive and negative observations correctly, but rather in avoiding false positives. Therefore, the most relevant performance measure was the selectivity (true negative rate, tnr), indicating the proportion of observations rightly classified in the negative category, and we are interested in maximizing it; its complementary value (1-tnr) gives the false positive rate, our interest being in its minimization. Sensitivity (true positive rate, tpr), defined as the proportion of observations in the positive class properly classified, is also relevant, although for our purposes it is preferably to have a higher selectivity and lower sensitivity than the other way round. The positive predictive value (PPV, precision), calculated as tp/(tp&#x002B;fp), where tp is the sum of all true positive values correctly classified and fp the false positives (misclassified observations from the positive class), is a composite measure reflecting both selectivity and sensitivity. Although not the most important for our purposes, for a better understanding of performance we also looked at the balanced accuracy (defined as the mean of tpr and tnr) and mean misclassification error (MMCE), defined as the proportion of cases where the response (classification result for a particular observation) is different from the truth (the real class of a particular observation). All these measures are implemented in the mlr package (<xref rid="b23-ol-0-0-10068" ref-type="bibr">23</xref>).</p>
<p>Besides 10-fold nested cross-validation and external testing, Y-scrambling was applied to assess the robustness of the models, ruling out to a reasonable extent the possibility that the models were the result of chance associations. The IC<sub>50</sub> value was randomly scrambled using 500 permutations (R package &#x2018;gtools&#x2019;) (<xref rid="b39-ol-0-0-10068" ref-type="bibr">39</xref>) and then several different models were re-built from zero (i.e., repeating the process of feature selection, so as to correspond to the new (scrambled) activity values) and the performance measures were computed for the new models thus re-built.</p>
<p>We assessed the applicability domain (AD) of the models developed employing the KNN approach developed by Sahigara <italic>et al</italic> (2013) (<xref rid="b40-ol-0-0-10068" ref-type="bibr">40</xref>) and the method proposed by Roy <italic>et al</italic> (2015) (<xref rid="b37-ol-0-0-10068" ref-type="bibr">37</xref>), which assumes normal distribution of the descriptor values, using code written by us in R. We have also explored the local density methods implemented in the R package &#x2018;ldbod&#x2019; (<xref rid="b41-ol-0-0-10068" ref-type="bibr">41</xref>), using arbitrary thresholds of 5 and 10&#x0025; for the ranked values of the local density-based outlier scores computed against the reference values of the train set. The same techniques were used to investigate and detect outliers among the train set values.</p>
</sec>
</sec>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title/>
<sec>
<title>Assessment of the dataset chemical diversity</title>
<p>To ensure a reasonable predictive accuracy of QSAR models it is important to have a data set sufficiently diverse (<xref rid="b42-ol-0-0-10068" ref-type="bibr">42</xref>) and in the literature various ways of the chemical diversity assessment have been used. We have computed a dissimilarity matrix based on the Gower distance, which is an appropriate measure for data sets containing combinations of numerical and categorical or binary variables and returns a distance that is already scaled, i.e., is always a number between 0 (identical values, no dissimilarity) and 1 (very distinct values, maximal dissimilarity) (<xref rid="b43-ol-0-0-10068" ref-type="bibr">43</xref>). For the dissimilarity matrix we used all 1D and 2D descriptors computed by Dragon Program, v. 7.0 after minimal processing for the removal of constant and near constant features (1,920 remaining descriptors). To get a quick understanding of the differences, a heat map of the dissimilarity matrix was drawn and examined (<xref rid="f1-ol-0-0-10068" ref-type="fig">Fig. 1</xref>). As indicated by the (smaller) density plot, most of the observations have a dissimilarity coefficient of 0.2&#x2013;0.6, i.e., there is a moderate chemical diversity in the whole dataset.</p>
<p>We also used the technique of Xu <italic>et al</italic> (<xref rid="b42-ol-0-0-10068" ref-type="bibr">42</xref>), who used a scatter plot of the molecular weight and AlogP for the substances from the learning and test subsets to assess whether the latter were distributed in the same chemical space as the former compounds. The graph showed that most test points were close to one or more several train points, but there were also a few outliers which seemed to be out of the AD of the models (<xref rid="f2-ol-0-0-10068" ref-type="fig">Fig. 2</xref>).</p>
<p>The exploration of the AD for the seven best performing models with the first two methods (based on the KNN and local probability density) has shown that for most only a small proportion (3.77&#x2013;12.3&#x0025; for the different sets of features and depending on the method used for the assessment) of the test set observations were outside the AD; moreover, in most cases despite the fact that those cases were outside the AD, most of them were predicted correctly (for instance all of the nine values identified by the KNN-based method as outside AD were predicted correctly by the RF model based on the first set of topological descriptors and oversampling, and 11 out of 13 values identified by the Roy method (<xref rid="b37-ol-0-0-10068" ref-type="bibr">37</xref>) as outside AD were also correctly classified for this method; in the case of 2D-autocorrelations, for the KNN method out of four values outside AD, three were correctly classified, all five values identified by probability density methods at the 5&#x0025; threshold were correctly predicted and four out of five identified by the Roy method were correctly labeled by this model.</p>
<p>In the case of informational indices, the number of test observations outside AD identified by the KNN method was surprisingly high (29.25&#x0025;, almost one in every three observations), and slightly more than half of those cases (51.61&#x0025;) were wrongly classified. The Roy method identified only five outliers and two of them were wrongly classified. The probability density methods suggested that slightly more than half of the values outside AD for this model were wrongly classified (3 out of 5 and 6 out of 10 most extreme values based on the outlier scores were wrongly predicted).</p>
</sec>
<sec>
<title>Performance of nested cross validation</title>
<p>We attempted to use the connectivity indexes but all descriptors of this subset had some values not available and therefore we preferred to discard this subset and not to build classification models based on these descriptors.</p>
<p>Using 4 classifiers, 13 different sets of descriptors, as well as &#x2018;synthetic&#x2019; samples obtained by over-sampling or a combination of over- and under-sampling (&#x2018;smote&#x2019;) different models were build, the performance of which was assessed through nested cross validation. Because we used 2 different algorithms for feature selection, which in most cases identified two partially different subsets of features (in rarer cases a single set of features), the total number of models evaluated was 186 (not counting those built with molecular properties, whose performance was poor). We report here only those models (n=28) with an acceptable performance [positive predictive value (PPV) higher than 75&#x0025; in both the nested cross-validation and on the previously unseen dataset] (<xref rid="tI-ol-0-0-10068" ref-type="table">Tables I</xref> and <xref rid="tII-ol-0-0-10068" ref-type="table">II</xref>). The performance of each model in the nested cross-validation and on the independent data set is shown in the <xref rid="SD1-ol-0-0-10068" ref-type="supplementary-material">Tables SI</xref> and <xref rid="SD1-ol-0-0-10068" ref-type="supplementary-material">SII</xref>.</p>
<p>Among the 186 models reported in the <xref rid="SD1-ol-0-0-10068" ref-type="supplementary-material">Tables SI</xref> and <xref rid="SD1-ol-0-0-10068" ref-type="supplementary-material">SII</xref>, none had a PPV higher than 0.90 in both nested cross-validation and on the external dataset, but seven models had a PPV higher than 0.85 in both evaluations, all seven using the RF algorithm as a classifier and topological descriptors, information indices, 2D-autocorrelation descriptors, P-VSA-like descriptors, and edge-adjacency descriptors as sets of features used for classification. For 16 models PPV was higher than 80&#x0025; with the two assessment methods (cross-validation and external evaluation). Using the pool of all descriptors and two feature selection algorithms did not lead to better results than using smaller blocks of descriptors: None of the 16 models developed with the pool of all 1D and 2D descriptors had a PPV higher than 80&#x0025; in both cross-validation and external testing and only two of those 16 models had a PPV higher than 75&#x0025; in both evaluations. We have not explored a larger range of feature selection options for this large pool of descriptors, but with the two also applied on the smaller blocks there was no clear advantage in using the larger number of descriptors as a start. Thus, on the subject of descriptor efficiency more is not necessarily better, in our case less was rather more.</p>
<p>The nitrogen percentage, oxygen atom numbers and oxygen percentage, number of multiple bonds, of heavy atoms, and of terminal atoms, as well as the average molecular weight, were the most important constitutional descriptors. The sense of the interactions between nitrogen percentage and average molecular weight, and between nitrogen percentage and number of terminal atoms in the RF model based on the unbalanced data is shown for exemplification in <xref rid="SD1-ol-0-0-10068" ref-type="supplementary-material">Figs. S1</xref> and <xref rid="SD1-ol-0-0-10068" ref-type="supplementary-material">S2</xref>. Among the ring descriptors, the first two most important were the molecular cyclized degree and aromatic ratio, both being easy to compute and easy to interpret; a sense of their interaction in an RF model is shown in <xref rid="SD1-ol-0-0-10068" ref-type="supplementary-material">Fig. S3</xref>.</p>
<p>The y-scrambling test was associated with considerably worse performance of the models re-built through the same steps as the initial models, with respect to all performance measures employed (e.g., PPV not higher than 0.50 and sensitivity lower than 5&#x0025;), thus strongly suggesting that the good performance of the models was not the result of chance, but rather of a real association between the cytotoxic effect on the melanoma cell line SK-MEL-5 and the descriptor blocks used in those models.</p>
</sec>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>A small number of &#x2018;local&#x2019; QSAR models have been published (<xref rid="b44-ol-0-0-10068" ref-type="bibr">44</xref>&#x2013;<xref rid="b47-ol-0-0-10068" ref-type="bibr">47</xref>), focused on the cytotoxicity of a limited number of similar substances against one or several cancer cell lines, but such models have a narrow range of chemical structures and a narrow domain of applicability (<xref rid="b48-ol-0-0-10068" ref-type="bibr">48</xref>). Our study is one of the few where cytotoxicity assessed on a cancer cell line (SK-MEL-5) is explored through &#x2018;global&#x2019; QSAR modelling. Such an approach is more challenging, because even for a single therapeutic target (a protein) median efficacy values (such as IC<sub>50</sub>) are more heterogeneous and likely to be affected by multiple sources of errors and to differ from one laboratory to another and from one experiment to another, depending on the experimental conditions. It is of notoriety that assays based on MTT and analogues rarely give consistent IC<sub>50</sub> values. In the case of cisplatin effect on the SKOV-3 cell lines, the IC<sub>50</sub> values reported in 17 published study sources varied between 2 and 40 &#x00B5;M, and although at the beginning it was thought that those inconsistencies were related to the reagents and their way of using them in various laboratories, it was later discovered that IC<sub>50</sub> remained inconsistent even when the assay was carried out by the same researcher in the same laboratory (<xref rid="b49-ol-0-0-10068" ref-type="bibr">49</xref>). Moreover, as it has been stated in the literature with respect to the methodology used in computing such efficacy values, &#x2018;just because a value is obtained does not mean it is accurate&#x2019; (<xref rid="b50-ol-0-0-10068" ref-type="bibr">50</xref>). For these reasons, QSAR modeling of IC<sub>50</sub> is more challenging and this was the reason why we preferred the use of classification techniques instead of modeling directly the IC<sub>50</sub> values through methods for continuous variables and our results show that developing QSAR models with reasonable performance in these conditions is feasible.</p>
<p>All seven best performing models used RF algorithm as a classifier, as were all 16 models with PPV higher than 80&#x0025; in both nested 10-fold cross-validation and external testing. Two BST models and one using SVM had PPV higher than 75&#x0025;, but for the latter algorithms the performance tended to be lower than that of RFs. These classifiers were more prone to overfit, having good performance with the artificially balanced data set (oversampling and smote technique), but rather poor performance in the external evaluation. In an independent study RFs also were reported to have better performance than BST (<xref rid="b51-ol-0-0-10068" ref-type="bibr">51</xref>), and in a comparative study it was reported that BST was more sensitive to noise than other machine learning algorithms (<xref rid="b52-ol-0-0-10068" ref-type="bibr">52</xref>). Balancing the data, irrespective of the classifier used tended to increase the sensitivity with a slight cost in specificity.</p>
<p>Of the thirteen descriptor blocks assessed by us to build the QSAR models, the best performing models (PPV higher than 80&#x0025; in both cross-validation and external testing) used five of these blocks: Topological descriptors, information indices, 2D-autocorrelation descriptors, P-VSA-like descriptors and edge adjacency indices.</p>
<p>Of the topological descriptors, the Balaban centric index (BAC) had the largest importance. It has been described as reflecting the molecular shape, but as little importance in other models published up to now (<xref rid="b53-ol-0-0-10068" ref-type="bibr">53</xref>). Other important topological descriptors were: Path/walk-2-randic shape index (PW2), which has been described as important in describing the antiviral activity of azolo-adamantanes (<xref rid="b54-ol-0-0-10068" ref-type="bibr">54</xref>); lopping centric index (LOC), which has been used previously in QSAR models for cytotoxic compounds on cancer cell lines (<xref rid="b55-ol-0-0-10068" ref-type="bibr">55</xref>,<xref rid="b56-ol-0-0-10068" ref-type="bibr">56</xref>); and Narumi harmonic topological index, which also has been shown useful in developing predictive cytotoxicity models (<xref rid="b57-ol-0-0-10068" ref-type="bibr">57</xref>).</p>
<p>Information indices best associated with the cytotoxic activity on the SK-MEL-5 were the mean information content on the vertex degree equality (IVDE), which has been previously shown to be important in predicting the COX-2 (<xref rid="b58-ol-0-0-10068" ref-type="bibr">58</xref>) and p56lck protein tyrosine kinase (<xref rid="b59-ol-0-0-10068" ref-type="bibr">59</xref>) inhibitory activities, Balaban U index (relevant in previous models for describing sweetness (<xref rid="b60-ol-0-0-10068" ref-type="bibr">60</xref>). Structural information content index (neighborhood symmetry of 0-order, SIC0), also used earlier for COX-2 inhibition prediction (<xref rid="b61-ol-0-0-10068" ref-type="bibr">61</xref>), as well as in toxicity models (<xref rid="b62-ol-0-0-10068" ref-type="bibr">62</xref>) turned out to be important in our models. Other information indices pertinent for the prediction of the anti-melanoma cell activity were the Balaban V index (shown to be relevant for the inhibitory effect on MATE1 transporter) (<xref rid="b63-ol-0-0-10068" ref-type="bibr">63</xref>), mean information content on the distance equality (IDE) used beforehand in models for HDM2 inhibitors (<xref rid="b64-ol-0-0-10068" ref-type="bibr">64</xref>), the Balaban Y index, Kier symmetry index, and the relative number of symmetry classes (rGES; not identified as important in other published QSAR models).</p>
<p>Among the 2D-autocorrelations, the most important descriptors were geary autocorrelation of lag 1 weighted by polarizability, used earlier to model cyclooxygenase-2 inhibitors (GATS1p) (<xref rid="b65-ol-0-0-10068" ref-type="bibr">65</xref>); moran autocorrelation of lag 3 weighted by Sanderson electronegativity (MATS3e), used previously to describe the antimalarial activity (<xref rid="b66-ol-0-0-10068" ref-type="bibr">66</xref>); geary autocorrelation of lag 3 weighted by Sanderson electronegativity (GATS3e), reported as significant in describing the antitubercular activity of 1,4-dihydropyridine-3,5-dicarboxamides (<xref rid="b67-ol-0-0-10068" ref-type="bibr">67</xref>), moran autocorrelation of lag 3 and 2, respectively, weighted by ionization potential (MATS3i and MATS2i), geary autocorrelation of lag 2 weighted by mass (GATS2m), and moran autocorrelation of lag 6 weighted by polarizability (MATS6p), not identified in previous publications as important for other QSAR models.</p>
<p>P-VSA-like descriptors have been scarcely used in QSAR models, as shown by the scarce studies including them. Among this group of descriptors, the most important used by us in building models with a reasonably good performance were: P_VSA-like on LogP, bin 5, P_VSA-like on mass, bin 4 (P_VSA_m_4), P_VSA-like on potential pharmacophore points, aromatic atoms, P_VSA-like on LogP, bin 1, P_VSA-like on potential pharmacophore points, L - lipophilic, P_VSA-like on Molar refractivity, bin 1, and P_VSA-like on Molar refractivity, bin 2. Of this group, only the P_VSA-like on mass, bin 4 (P_VSA_m_4) was reported in models on olfactory properties (<xref rid="b68-ol-0-0-10068" ref-type="bibr">68</xref>), whereas the remainder have not been reported in other QSAR models as being significant features. The same is true for the relevant edge-adjacency descriptors used in building our models: Although a number of other studies reported the use of different edge-adjacency descriptors, none of those found by the feature selection algorithms applied by us were reported in published models: SpMAD_AEA(ed)-spectral mean absolute deviation from augmented edge adjacency matrix weighted by edge degree; SpMAD_EA(bo)-normalized leading eigenvalue from augmented edge adjacency matrix weighted by bond order; Eig02_AEA(bo)-eigenvalue n. 2 from augmented edge adjacency matrix weighted by bond order; SpDiam_EA(bo)-spectral diameter from edge adjacency matrix weighted by bond order; SpMAD_AEA(dm)-spectral mean absolute deviation from augmented edge adjacency matrix weighted by dipole moment; SpDiam_EA(dm)-spectral diameter from edge adjacency matrix weighted by dipole moment; SpMaxA_EA(dm)-normalized leading eigenvalue from edge adjacency matrix weighted by dipole moment.</p>
<p>Simpler, more easily interpretable descriptors, such as constitutional ones, ring descriptors or molecular properties led to models with lower performance (but models with PPV higher than 70&#x0025; could be built with the constitutional and ring descriptors).</p>
<p>Exploring a variety of descriptor blocks to produce QSAR models able to anticipate the cytotoxicity of chemical compounds on the cancer cell line SK-MEL-5, we were able to build models with good performance in terms of selectivity and PPV, but with relatively low sensitivity. In other words, the models built have good performance in having a low rate of false positives, but this is done at the cost of labelling about half of the active compounds as &#x2018;inactive&#x2019;. Of the four classification algorithms applied, RF was the most effective, all models with PPV higher than 85&#x0025; in both (nested) cross-validation and external evaluation being built with this classifier. The descriptors most appropriate to describe the effect on the cancer cell line SK-MEL-5 were topological, information indices, 2D-autocorrelation descriptors, P-VSA-like descriptors and edge adjacency indices. All these groups are rather hard to interpret in a simple manner, but simpler descriptors (e.g., constitutional descriptors, ring descriptors, molecular properties) led to less successful models.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="SD1-ol-0-0-10068" content-type="local-data">
<caption>
<title>Supporting Data</title>
</caption>
<media mimetype="application" mime-subtype="pdf" xlink:href="Supplementary_Data.pdf"/>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>Not applicable.</p>
</ack>
<sec>
<title>Funding</title>
<p>This study was partially supported by a grant of Romanian Ministry of Research and Innovation (CCCDI-UEFISCDI) (project no. 61PCCDI&#x2044;2018 PN-III-P1-1.2-PCCDI-2017-0341; Bucharest, Romania) within PNCDI&#x2013;III.</p>
</sec>
<sec>
<title>Availability of data and materials</title>
<p>The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.</p>
</sec>
<sec>
<title>Authors&#x0027; contributions</title>
<p>RA was responsible for the conception and design of the study, checked the primary data and performed the modelling. IN collected and analysed the primary data. MD, IN, FGL, DB contributed to the design and interpretation of the data and writing the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec>
<title>Ethics approval and consent to participate</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Patient consent for publication</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>RA has received consultancy and speakers&#x0027; fees from various pharmaceutical companies. MD, IN, FGL and DB declare they have no competing interests.</p>
</sec>
<glossary>
<def-list>
<title>Abbreviations</title>
<def-item><term>BST</term><def><p>gradient boosting</p></def></def-item>
<def-item><term>ERM</term><def><p>empirical risk minimization</p></def></def-item>
<def-item><term>KNN</term><def><p>k-nearest neighbors</p></def></def-item>
<def-item><term>PPV</term><def><p>positive predictive value</p></def></def-item>
<def-item><term>QSAR</term><def><p>quantitative structure-activity relationship</p></def></def-item>
<def-item><term>RF</term><def><p>random forests</p></def></def-item>
<def-item><term>SRM</term><def><p>structure risk minimization</p></def></def-item>
<def-item><term>SVM</term><def><p>support vector machines</p></def></def-item>
</def-list>
</glossary>
<ref-list>
<title>References</title>
<ref id="b1-ol-0-0-10068"><label>1</label><element-citation publication-type="online"><collab collab-type="corp-author">European Chemical Agency (ECHA): Practical guide</collab><article-title>How to use and report (Q)SARs. Version 3.1</article-title><publisher-name>ECHA</publisher-name><publisher-loc>Helsinki</publisher-loc><year>2016</year><uri>https://echa.europa.eu/documents/10162/13655/pg_report_qsars_en.pdf</uri><date-in-citation content-type="access-date"><month>July</month><year>2016</year></date-in-citation></element-citation></ref>
<ref id="b2-ol-0-0-10068"><label>2</label><element-citation publication-type="journal"><person-group person-group-type="editor"><name><surname>Launer</surname><given-names>RL</given-names></name><name><surname>Wilkinson</surname><given-names>GN</given-names></name></person-group><article-title>Robustness in the strategy of scientific model building</article-title><source>Robustness in Statistics</source><edition>1st</edition><publisher-name>Elsevier</publisher-name><fpage>201</fpage><lpage>236</lpage><year>1979</year></element-citation></ref>
<ref id="b3-ol-0-0-10068"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aouidate</surname><given-names>A</given-names></name><name><surname>Ghaleb</surname><given-names>A</given-names></name><name><surname>Ghamali</surname><given-names>M</given-names></name><name><surname>Chtita</surname><given-names>S</given-names></name><name><surname>Ousaa</surname><given-names>A</given-names></name><name><surname>Choukrad</surname><given-names>M</given-names></name><name><surname>Sbai</surname><given-names>A</given-names></name><name><surname>Bouachrine</surname><given-names>M</given-names></name><name><surname>Lakhlifi</surname><given-names>T</given-names></name></person-group><article-title>QSAR study and rustic ligand-based virtual screening in a search for aminooxadiazole derivatives as PIM1 inhibitors</article-title><source>Chem Cent J</source><volume>12</volume><fpage>32</fpage><year>2018</year><pub-id pub-id-type="doi">10.1186/s13065-018-0401-x</pub-id><pub-id pub-id-type="pmid">29564572</pub-id></element-citation></ref>
<ref id="b4-ol-0-0-10068"><label>4</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lima</surname><given-names>MNN</given-names></name><name><surname>Melo-Filho</surname><given-names>CC</given-names></name><name><surname>Cassiano</surname><given-names>GC</given-names></name><name><surname>Neves</surname><given-names>BJ</given-names></name><name><surname>Alves</surname><given-names>VM</given-names></name><name><surname>Braga</surname><given-names>RC</given-names></name><name><surname>Cravo</surname><given-names>PVL</given-names></name><name><surname>Muratov</surname><given-names>EN</given-names></name><name><surname>Calit</surname><given-names>J</given-names></name><name><surname>Bargieri</surname><given-names>DY</given-names></name><etal/></person-group><article-title>QSAR-driven design and discovery of novel compounds with antiplasmodial and transmission blocking activities</article-title><source>Front Pharmacol</source><volume>9</volume><fpage>146</fpage><year>2018</year><pub-id pub-id-type="doi">10.3389/fphar.2018.00146</pub-id><pub-id pub-id-type="pmid">29559909</pub-id></element-citation></ref>
<ref id="b5-ol-0-0-10068"><label>5</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Qin</surname><given-names>L</given-names></name><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Chen</surname><given-names>Y</given-names></name><name><surname>Mo</surname><given-names>L</given-names></name><name><surname>Zeng</surname><given-names>H</given-names></name><name><surname>Liang</surname><given-names>Y</given-names></name></person-group><article-title>Predictive QSAR models for the toxicity of disinfection byproducts</article-title><source>Molecules</source><volume>22</volume><fpage>1671</fpage><year>2017</year><pub-id pub-id-type="doi">10.3390/molecules22101671</pub-id></element-citation></ref>
<ref id="b6-ol-0-0-10068"><label>6</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname><given-names>L</given-names></name><name><surname>Yang</surname><given-names>H</given-names></name><name><surname>Li</surname><given-names>J</given-names></name><name><surname>Wang</surname><given-names>T</given-names></name><name><surname>Li</surname><given-names>W</given-names></name><name><surname>Liu</surname><given-names>G</given-names></name><name><surname>Tang</surname><given-names>Y</given-names></name></person-group><article-title>In silico pediction of compounds binding to human plasma proteins by QSAR models</article-title><source>ChemMedChem</source><volume>13</volume><fpage>572</fpage><lpage>581</lpage><year>2018</year><pub-id pub-id-type="doi">10.1002/cmdc.201700582</pub-id><pub-id pub-id-type="pmid">29057587</pub-id></element-citation></ref>
<ref id="b7-ol-0-0-10068"><label>7</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nembri</surname><given-names>S</given-names></name><name><surname>Grisoni</surname><given-names>F</given-names></name><name><surname>Consonni</surname><given-names>V</given-names></name><name><surname>Todeschini</surname><given-names>R</given-names></name></person-group><article-title>In silico prediction of cytochrome P450-drug interaction: QSARs for CYP3A4 and CYP2C9</article-title><source>Int J Mol Sci</source><volume>17</volume><fpage>914</fpage><year>2016</year><pub-id pub-id-type="doi">10.3390/ijms17060914</pub-id></element-citation></ref>
<ref id="b8-ol-0-0-10068"><label>8</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Garmpis</surname><given-names>N</given-names></name><name><surname>Damaskos</surname><given-names>C</given-names></name><name><surname>Garmpi</surname><given-names>A</given-names></name><name><surname>Dimitroulis</surname><given-names>D</given-names></name><name><surname>Spartalis</surname><given-names>E</given-names></name><name><surname>Margonis</surname><given-names>GA</given-names></name><name><surname>Schizas</surname><given-names>D</given-names></name><name><surname>Deskou</surname><given-names>I</given-names></name><name><surname>Doula</surname><given-names>C</given-names></name><name><surname>Magkouti</surname><given-names>E</given-names></name><etal/></person-group><article-title>Targeting histone deacetylases in malignant melanoma: A future therapeutic agent or just great expectations?</article-title><source>Anticancer Res</source><volume>37</volume><fpage>5355</fpage><lpage>5362</lpage><year>2017</year><pub-id pub-id-type="pmid">28982843</pub-id></element-citation></ref>
<ref id="b9-ol-0-0-10068"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Stueven</surname><given-names>NA</given-names></name><name><surname>Schlaeger</surname><given-names>NM</given-names></name><name><surname>Monte</surname><given-names>AP</given-names></name><name><surname>Hwang</surname><given-names>SL</given-names></name><name><surname>Huang</surname><given-names>CC</given-names></name></person-group><article-title>A novel stilbene-like compound that inhibits melanoma growth by regulating melanocyte differentiation and proliferation</article-title><source>Toxicol Appl Pharmacol</source><volume>337</volume><fpage>30</fpage><lpage>38</lpage><year>2017</year><pub-id pub-id-type="doi">10.1016/j.taap.2017.10.008</pub-id><pub-id pub-id-type="pmid">29042215</pub-id></element-citation></ref>
<ref id="b10-ol-0-0-10068"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Marra</surname><given-names>A</given-names></name><name><surname>Ferrone</surname><given-names>CR</given-names></name><name><surname>Fusciello</surname><given-names>C</given-names></name><name><surname>Scognamiglio</surname><given-names>G</given-names></name><name><surname>Ferrone</surname><given-names>S</given-names></name><name><surname>Pepe</surname><given-names>S</given-names></name><name><surname>Perri</surname><given-names>F</given-names></name><name><surname>Sabbatino</surname><given-names>F</given-names></name></person-group><article-title>Translational research in cutaneous melanoma: New therapeutic perspectives</article-title><source>Anticancer Agents Med Chem</source><volume>18</volume><fpage>166</fpage><lpage>181</lpage><year>2018</year><pub-id pub-id-type="doi">10.2174/1871520618666171219115335</pub-id><pub-id pub-id-type="pmid">29256359</pub-id></element-citation></ref>
<ref id="b11-ol-0-0-10068"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Theodosakis</surname><given-names>N</given-names></name><name><surname>Micevic</surname><given-names>G</given-names></name><name><surname>Langdon</surname><given-names>CG</given-names></name><name><surname>Ventura</surname><given-names>A</given-names></name><name><surname>Means</surname><given-names>R</given-names></name><name><surname>Stern</surname><given-names>DF</given-names></name><name><surname>Bosenberg</surname><given-names>MW</given-names></name></person-group><article-title>p90RSK blockade inhibits dual BRAF and MEK inhibitor-resistant melanoma by targeting protein synthesis</article-title><source>J Invest Dermatol</source><volume>137</volume><fpage>2187</fpage><lpage>2196</lpage><year>2017</year><pub-id pub-id-type="doi">10.1016/j.jid.2016.12.033</pub-id><pub-id pub-id-type="pmid">28599981</pub-id></element-citation></ref>
<ref id="b12-ol-0-0-10068"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mioc</surname><given-names>M</given-names></name><name><surname>Pavel</surname><given-names>IZ</given-names></name><name><surname>Ghiulai</surname><given-names>R</given-names></name><name><surname>Coricovac</surname><given-names>DE</given-names></name><name><surname>Farca&#x015F;</surname><given-names>C</given-names></name><name><surname>Mihali</surname><given-names>CV</given-names></name><name><surname>Oprean</surname><given-names>C</given-names></name><name><surname>Serafim</surname><given-names>V</given-names></name><name><surname>Popovici</surname><given-names>RA</given-names></name><name><surname>Dehelean</surname><given-names>CA</given-names></name><etal/></person-group><article-title>The cytotoxic effects of betulin-conjugated gold nanoparticles as stable formulations in normal and melanoma cells</article-title><source>Front Pharmacol</source><volume>9</volume><fpage>429</fpage><year>2018</year><pub-id pub-id-type="doi">10.3389/fphar.2018.00429</pub-id><pub-id pub-id-type="pmid">29773989</pub-id></element-citation></ref>
<ref id="b13-ol-0-0-10068"><label>13</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Memorial Sloan</surname><given-names>Kettering</given-names></name><name><surname>Cancer</surname><given-names>Center</given-names></name></person-group><article-title>SK-MEL-5: Human Melanoma Cell Line (ATCC HTB 70)</article-title><uri>https://www.mskcc.org/research-advantage/support/technology/tangible-material/human-melanoma-cell-line-sk-mel-5</uri><date-in-citation content-type="access-date"><month>August</month><day>30</day><year>2018</year></date-in-citation></element-citation></ref>
<ref id="b14-ol-0-0-10068"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Al-Qathama</surname><given-names>A</given-names></name><name><surname>Gibbons</surname><given-names>S</given-names></name><name><surname>Prieto</surname><given-names>JM</given-names></name></person-group><article-title>Differential modulation of Bax/Bcl-2 ratio and onset of caspase-3/7 activation induced by derivatives of Justicidin B in human melanoma cells A375</article-title><source>Oncotarget</source><volume>8</volume><fpage>95999</fpage><lpage>96012</lpage><year>2017</year><pub-id pub-id-type="doi">10.18632/oncotarget.21625</pub-id><pub-id pub-id-type="pmid">29221182</pub-id></element-citation></ref>
<ref id="b15-ol-0-0-10068"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Carbone</surname><given-names>C</given-names></name><name><surname>Martins-Gomes</surname><given-names>C</given-names></name><name><surname>Pepe</surname><given-names>V</given-names></name><name><surname>Silva</surname><given-names>AM</given-names></name><name><surname>Musumeci</surname><given-names>T</given-names></name><name><surname>Puglisi</surname><given-names>G</given-names></name><name><surname>Furneri</surname><given-names>PM</given-names></name><name><surname>Souto</surname><given-names>EB</given-names></name></person-group><article-title>Repurposing itraconazole to the benefit of skin cancer treatment: A combined azole-DDAB nanoencapsulation strategy</article-title><source>Colloids Surf B Biointerfaces</source><volume>167</volume><fpage>337</fpage><lpage>344</lpage><year>2018</year><pub-id pub-id-type="doi">10.1016/j.colsurfb.2018.04.031</pub-id><pub-id pub-id-type="pmid">29684903</pub-id></element-citation></ref>
<ref id="b16-ol-0-0-10068"><label>16</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Al-Sanea</surname><given-names>MM</given-names></name><name><surname>Ali Khan</surname><given-names>MS</given-names></name><name><surname>Abdelazem</surname><given-names>AZ</given-names></name><name><surname>Lee</surname><given-names>SH</given-names></name><name><surname>Mok</surname><given-names>PL</given-names></name><name><surname>Gamal</surname><given-names>M</given-names></name><name><surname>Shaker</surname><given-names>ME</given-names></name><name><surname>Afzal</surname><given-names>M</given-names></name><name><surname>Youssif</surname><given-names>BG</given-names></name><name><surname>Omar</surname><given-names>NN</given-names></name></person-group><article-title>Synthesis and in vitro antiproliferative activity of new 1-phenyl-3-(4-(pyridin-3-yl)phenyl)urea scaffold-based compounds</article-title><source>Molecules</source><volume>23</volume><fpage>297</fpage><year>2018</year><pub-id pub-id-type="doi">10.3390/molecules23020297</pub-id></element-citation></ref>
<ref id="b17-ol-0-0-10068"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Plitzko</surname><given-names>B</given-names></name><name><surname>Kaweesa</surname><given-names>EN</given-names></name><name><surname>Loesgen</surname><given-names>S</given-names></name></person-group><article-title>The natural product mensacarcin induces mitochondrial toxicity and apoptosis in melanoma cells</article-title><source>J Biol Chem</source><volume>292</volume><fpage>21102</fpage><lpage>21116</lpage><year>2017</year><pub-id pub-id-type="doi">10.1074/jbc.M116.774836</pub-id><pub-id pub-id-type="pmid">29074620</pub-id></element-citation></ref>
<ref id="b18-ol-0-0-10068"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kalliokoski</surname><given-names>T</given-names></name><name><surname>Kramer</surname><given-names>C</given-names></name><name><surname>Vulpetti</surname><given-names>A</given-names></name><name><surname>Gedeck</surname><given-names>P</given-names></name></person-group><article-title>Comparability of mixed IC<sub>50</sub> data: A statistical analysis</article-title><source>PLoS One</source><volume>8</volume><fpage>e61007</fpage><year>2013</year><pub-id pub-id-type="doi">10.1371/journal.pone.0061007</pub-id><pub-id pub-id-type="pmid">23613770</pub-id></element-citation></ref>
<ref id="b19-ol-0-0-10068"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Niu</surname><given-names>AQ</given-names></name><name><surname>Xie</surname><given-names>LJ</given-names></name><name><surname>Wang</surname><given-names>H</given-names></name><name><surname>Zhu</surname><given-names>B</given-names></name><name><surname>Wang</surname><given-names>SQ</given-names></name></person-group><article-title>Prediction of selective estrogen receptor beta agonist using open data and machine learning approach</article-title><source>Drug Des Devel Ther</source><volume>10</volume><fpage>2323</fpage><lpage>2331</lpage><year>2016</year><pub-id pub-id-type="doi">10.2147/DDDT.S110603</pub-id><pub-id pub-id-type="pmid">27486309</pub-id></element-citation></ref>
<ref id="b20-ol-0-0-10068"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Datta</surname><given-names>S</given-names></name><name><surname>Das</surname><given-names>S</given-names></name></person-group><article-title>Near-bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs</article-title><source>Neural Netw</source><volume>70</volume><fpage>39</fpage><lpage>52</lpage><year>2015</year><pub-id pub-id-type="doi">10.1016/j.neunet.2015.06.005</pub-id><pub-id pub-id-type="pmid">26210983</pub-id></element-citation></ref>
<ref id="b21-ol-0-0-10068"><label>21</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Cortez</surname><given-names>P</given-names></name></person-group><article-title>Package &#x2018;rminer&#x2019;: Data Mining Classification and Regression Methods. Version 1.4.2</article-title><uri>https://cran.r-project.org/web/packages/rminer/rminer.pdf</uri><date-in-citation content-type="access-date"><month>September</month><day>2</day><year>2016</year></date-in-citation></element-citation></ref>
<ref id="b22-ol-0-0-10068"><label>22</label><element-citation publication-type="book"><collab collab-type="corp-author">R Core Team R</collab><article-title>A Language and Environment for Statistical Computing</article-title><publisher-name>R Foundation for Statistical Computing</publisher-name><publisher-loc>Vienna</publisher-loc><year>2018</year></element-citation></ref>
<ref id="b23-ol-0-0-10068"><label>23</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bischl</surname><given-names>B</given-names></name><name><surname>Lang</surname><given-names>M</given-names></name><name><surname>Kotthoff</surname><given-names>L</given-names></name><name><surname>Schiffner</surname><given-names>J</given-names></name><name><surname>Richter</surname><given-names>J</given-names></name><name><surname>Studerus</surname><given-names>E</given-names></name><name><surname>Casalicchio</surname><given-names>G</given-names></name><name><surname>Jones</surname><given-names>ZM</given-names></name></person-group><article-title>mlr: Machine learning in R</article-title><source>J Mach Learn Res</source><volume>17</volume><fpage>1</fpage><lpage>5</lpage><year>2016</year></element-citation></ref>
<ref id="b24-ol-0-0-10068"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liaw</surname><given-names>A</given-names></name><name><surname>Wiener</surname><given-names>M</given-names></name></person-group><article-title>Classification and regression by randomForest</article-title><source>R News</source><volume>2</volume><fpage>18</fpage><lpage>22</lpage><year>2002</year></element-citation></ref>
<ref id="b25-ol-0-0-10068"><label>25</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Romanski</surname><given-names>P</given-names></name><name><surname>Kotthoff</surname><given-names>L</given-names></name></person-group><article-title>FSelector: Selecting Attributes</article-title><source>R package</source><comment>version 0.31</comment><uri>https://cran.r-project.org/web/packages/FSelector/index.html</uri><date-in-citation content-type="access-date"><month>November</month><day>19</day><year>2018</year></date-in-citation></element-citation></ref>
<ref id="b26-ol-0-0-10068"><label>26</label><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Ho</surname><given-names>TK</given-names></name></person-group><article-title>Random decision forests</article-title><volume>1</volume><publisher-name>IEEE Computer Society Press</publisher-name><publisher-loc>Washington, DC</publisher-loc><fpage>278</fpage><lpage>282</lpage><year>1995</year></element-citation></ref>
<ref id="b27-ol-0-0-10068"><label>27</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Breiman</surname><given-names>L</given-names></name></person-group><article-title>Random forests</article-title><source>Mach Learn</source><volume>45</volume><fpage>5</fpage><lpage>32</lpage><year>2001</year><pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id></element-citation></ref>
<ref id="b28-ol-0-0-10068"><label>28</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Svetnik</surname><given-names>V</given-names></name><name><surname>Liaw</surname><given-names>A</given-names></name><name><surname>Tong</surname><given-names>C</given-names></name><name><surname>Culberson</surname><given-names>JC</given-names></name><name><surname>Sheridan</surname><given-names>RP</given-names></name><name><surname>Feuston</surname><given-names>BP</given-names></name></person-group><article-title>Random forest: A classification and regression tool for compound classification and QSAR modeling</article-title><source>J Chem Inf Comput Sci</source><volume>43</volume><fpage>1947</fpage><lpage>1958</lpage><year>2003</year><pub-id pub-id-type="doi">10.1021/ci034160g</pub-id><pub-id pub-id-type="pmid">14632445</pub-id></element-citation></ref>
<ref id="b29-ol-0-0-10068"><label>29</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Natekin</surname><given-names>A</given-names></name><name><surname>Knoll</surname><given-names>A</given-names></name></person-group><article-title>Gradient boosting machines, a tutorial</article-title><source>Front Neurorobot</source><volume>7</volume><fpage>21</fpage><year>2013</year><pub-id pub-id-type="doi">10.3389/fnbot.2013.00021</pub-id><pub-id pub-id-type="pmid">24409142</pub-id></element-citation></ref>
<ref id="b30-ol-0-0-10068"><label>30</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>He</surname><given-names>T</given-names></name><name><surname>Heidemeyer</surname><given-names>M</given-names></name><name><surname>Ban</surname><given-names>F</given-names></name><name><surname>Cherkasov</surname><given-names>A</given-names></name><name><surname>Ester</surname><given-names>M</given-names></name></person-group><article-title>SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines</article-title><source>J Cheminformatics</source><volume>9</volume><fpage>24</fpage><year>2017</year><pub-id pub-id-type="doi">10.1186/s13321-017-0209-z</pub-id></element-citation></ref>
<ref id="b31-ol-0-0-10068"><label>31</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>Z</given-names></name></person-group><article-title>Package &#x2018;bst&#x2019;: Gradient Boosting. Version 0.3&#x2013;15</article-title><uri>https://cran.r-project.org/web/packages/bst/bst.pdf</uri><date-in-citation content-type="access-date"><month>July</month><day>23</day><year>2018</year></date-in-citation></element-citation></ref>
<ref id="b32-ol-0-0-10068"><label>32</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Therneau</surname><given-names>T</given-names></name><name><surname>Atkinson</surname><given-names>B</given-names></name></person-group><article-title>Package &#x2018;rpart&#x2019;: Recursive Partitioning and Regression Trees. Version 4.1&#x2013;13</article-title><uri>https://cran.r-project.org/web/packages/rpart/rpart.pdf</uri><date-in-citation content-type="access-date"><month>February</month><day>23</day><year>2018</year></date-in-citation><date-in-citation content-type="access-date"><month>August</month><day>30</day><year>2018</year></date-in-citation></element-citation></ref>
<ref id="b33-ol-0-0-10068"><label>33</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname><given-names>M</given-names></name><name><surname>Wang</surname><given-names>XS</given-names></name><name><surname>Roth</surname><given-names>BL</given-names></name><name><surname>Golbraikh</surname><given-names>A</given-names></name><name><surname>Tropsha</surname><given-names>A</given-names></name></person-group><article-title>Application of quantitative structure-activity relationship models of 5-HT<sub>1A</sub> receptor binding to virtual screening identifies novel and potent 5-HT<sub>1A</sub> ligands</article-title><source>J Chem Inf Model</source><volume>54</volume><fpage>634</fpage><lpage>647</lpage><year>2014</year><pub-id pub-id-type="doi">10.1021/ci400460q</pub-id><pub-id pub-id-type="pmid">24410373</pub-id></element-citation></ref>
<ref id="b34-ol-0-0-10068"><label>34</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pourbasheer</surname><given-names>E</given-names></name><name><surname>Vahdani</surname><given-names>S</given-names></name><name><surname>Malekzadeh</surname><given-names>D</given-names></name><name><surname>Aalizadeh</surname><given-names>R</given-names></name><name><surname>Ebadi</surname><given-names>A</given-names></name></person-group><article-title>QSAR Study of 17&#x03B2;-HSD3 inhibitors by genetic algorithm-support vector machine as a target receptor for the treatment of prostate cancer</article-title><source>Iran J Pharm Res</source><volume>16</volume><fpage>966</fpage><lpage>980</lpage><year>2017</year><pub-id pub-id-type="pmid">29201087</pub-id></element-citation></ref>
<ref id="b35-ol-0-0-10068"><label>35</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Meyer</surname><given-names>D</given-names></name><name><surname>Dimitriadou</surname><given-names>E</given-names></name><name><surname>Hornik</surname><given-names>K</given-names></name><name><surname>Weingessel</surname><given-names>A</given-names></name><name><surname>Leisch</surname><given-names>F</given-names></name></person-group><collab collab-type="corp-author">e1071: Misc Functions of the Department of Statistics</collab><publisher-name>Probability Theory Group (Formerly: E1071), TU Wien</publisher-name><comment>Version 1.6&#x2013;8</comment><uri>https://rdrr.io/rforge/e1071/</uri><date-in-citation content-type="access-date"><month>May</month><day>31</day><year>2017</year></date-in-citation></element-citation></ref>
<ref id="b36-ol-0-0-10068"><label>36</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname><given-names>C</given-names></name><name><surname>Fang</surname><given-names>J</given-names></name><name><surname>Guo</surname><given-names>P</given-names></name><name><surname>Wang</surname><given-names>Q</given-names></name><name><surname>Hong</surname><given-names>H</given-names></name><name><surname>Moslehi</surname><given-names>J</given-names></name><name><surname>Cheng</surname><given-names>F</given-names></name></person-group><article-title>In silico pharmacoepidemiologic evaluation of drug-induced cardiovascular complications using combined classifiers</article-title><source>J Chem Inf Model</source><volume>58</volume><fpage>943</fpage><lpage>956</lpage><year>2018</year><pub-id pub-id-type="doi">10.1021/acs.jcim.7b00641</pub-id><pub-id pub-id-type="pmid">29712429</pub-id></element-citation></ref>
<ref id="b37-ol-0-0-10068"><label>37</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Roy</surname><given-names>K</given-names></name><name><surname>Kar</surname><given-names>S</given-names></name><name><surname>Ambure</surname><given-names>P</given-names></name></person-group><article-title>On a simple approach for determining applicability domain of QSAR models</article-title><source>Chemometr Intell Lab Syst</source><volume>145</volume><fpage>22</fpage><lpage>29</lpage><year>2015</year><pub-id pub-id-type="doi">10.1016/j.chemolab.2015.04.013</pub-id></element-citation></ref>
<ref id="b38-ol-0-0-10068"><label>38</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Li</surname><given-names>S</given-names></name></person-group><article-title>Package &#x2018;rknn&#x2019;: Random KNN Classification and Regression</article-title><uri>https://cran.r-project.org/web/packages/rknn/rknn.pdf</uri><date-in-citation content-type="access-date"><month>June</month><day>7</day><year>2015</year></date-in-citation></element-citation></ref>
<ref id="b39-ol-0-0-10068"><label>39</label><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Warnes</surname><given-names>GR</given-names></name><name><surname>Bolker</surname><given-names>B</given-names></name><name><surname>Lumley</surname><given-names>T</given-names></name></person-group><article-title>gtools: various R programming tools</article-title><publisher-name>R Foundation for Statistical Computing</publisher-name><publisher-loc>Vienna</publisher-loc><year>2015</year></element-citation></ref>
<ref id="b40-ol-0-0-10068"><label>40</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sahigara</surname><given-names>F</given-names></name><name><surname>Ballabio</surname><given-names>D</given-names></name><name><surname>Todeschini</surname><given-names>R</given-names></name><name><surname>Consonni</surname><given-names>V</given-names></name></person-group><article-title>Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions</article-title><source>J Cheminform</source><volume>5</volume><fpage>27</fpage><year>2013</year><pub-id pub-id-type="doi">10.1186/1758-2946-5-27</pub-id><pub-id pub-id-type="pmid">23721648</pub-id></element-citation></ref>
<ref id="b41-ol-0-0-10068"><label>41</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Williams</surname><given-names>K</given-names></name></person-group><article-title>Package &#x2018;ldbod&#x2019;: Local Density-Based Outlier Detection. Version 0.1.2</article-title><uri>https://cran.r-project.org/web/packages/ldbod/ldbod.pdf</uri><date-in-citation content-type="access-date"><month>May</month><day>26</day><year>2017</year></date-in-citation></element-citation></ref>
<ref id="b42-ol-0-0-10068"><label>42</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname><given-names>C</given-names></name><name><surname>Cheng</surname><given-names>F</given-names></name><name><surname>Chen</surname><given-names>L</given-names></name><name><surname>Du</surname><given-names>Z</given-names></name><name><surname>Li</surname><given-names>W</given-names></name><name><surname>Liu</surname><given-names>G</given-names></name><name><surname>Lee</surname><given-names>PW</given-names></name><name><surname>Tang</surname><given-names>Y</given-names></name></person-group><article-title>In silico prediction of chemical Ames mutagenicity</article-title><source>J Chem Inf Model</source><volume>52</volume><fpage>2840</fpage><lpage>2847</lpage><year>2012</year><pub-id pub-id-type="doi">10.1021/ci300400a</pub-id><pub-id pub-id-type="pmid">23030379</pub-id></element-citation></ref>
<ref id="b43-ol-0-0-10068"><label>43</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gower</surname><given-names>JC</given-names></name></person-group><article-title>A general coefficient of similarity and some of its properties</article-title><source>Biometrics</source><volume>27</volume><fpage>857</fpage><year>1971</year><pub-id pub-id-type="doi">10.2307/2528823</pub-id></element-citation></ref>
<ref id="b44-ol-0-0-10068"><label>44</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Miladiyah</surname><given-names>I</given-names></name><name><surname>Jumina</surname><given-names>J</given-names></name><name><surname>Haryana</surname><given-names>SM</given-names></name><name><surname>Mustofa</surname><given-names>M</given-names></name></person-group><article-title>Biological activity, quantitative structure-activity relationship analysis, and molecular docking of xanthone derivatives as anticancer drugs</article-title><source>Drug Des Devel Ther</source><volume>12</volume><fpage>149</fpage><lpage>158</lpage><year>2018</year><pub-id pub-id-type="doi">10.2147/DDDT.S149973</pub-id><pub-id pub-id-type="pmid">29391779</pub-id></element-citation></ref>
<ref id="b45-ol-0-0-10068"><label>45</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yadav</surname><given-names>DK</given-names></name><name><surname>Kumar</surname><given-names>S</given-names></name><name><surname>Saloni</surname><given-names>S</given-names></name><name><surname>Singh</surname><given-names>H</given-names></name><name><surname>Kim</surname><given-names>MH</given-names></name><name><surname>Sharma</surname><given-names>P</given-names></name><name><surname>Misra</surname><given-names>S</given-names></name><name><surname>Khan</surname><given-names>F</given-names></name></person-group><article-title>Molecular docking, QSAR and ADMET studies of withanolide analogs against breast cancer</article-title><source>Drug Des Devel Ther</source><volume>11</volume><fpage>1859</fpage><lpage>1870</lpage><year>2017</year><pub-id pub-id-type="doi">10.2147/DDDT.S130601</pub-id><pub-id pub-id-type="pmid">28694686</pub-id></element-citation></ref>
<ref id="b46-ol-0-0-10068"><label>46</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gaikwad</surname><given-names>R</given-names></name><name><surname>Ghorai</surname><given-names>S</given-names></name><name><surname>Amin</surname><given-names>SA</given-names></name><name><surname>Adhikari</surname><given-names>N</given-names></name><name><surname>Patel</surname><given-names>T</given-names></name><name><surname>Das</surname><given-names>K</given-names></name><name><surname>Jha</surname><given-names>T</given-names></name><name><surname>Gayen</surname><given-names>S</given-names></name></person-group><article-title>Monte Carlo based modelling approach for designing and predicting cytotoxicity of 2-phenylindole derivatives against breast cancer cell line MCF7</article-title><source>Toxicol In Vitro</source><volume>52</volume><fpage>23</fpage><lpage>32</lpage><year>2018</year><pub-id pub-id-type="doi">10.1016/j.tiv.2018.05.016</pub-id><pub-id pub-id-type="pmid">29864472</pub-id></element-citation></ref>
<ref id="b47-ol-0-0-10068"><label>47</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Abdelhaleem</surname><given-names>EF</given-names></name><name><surname>Abdelhameid</surname><given-names>MK</given-names></name><name><surname>Kassab</surname><given-names>AE</given-names></name><name><surname>Kandeel</surname><given-names>MM</given-names></name></person-group><article-title>Design and synthesis of thienopyrimidine urea derivatives with potential cytotoxic and pro-apoptotic activity against breast cancer cell line MCF-7</article-title><source>Eur J Med Chem</source><volume>143</volume><fpage>1807</fpage><lpage>1825</lpage><year>2018</year><pub-id pub-id-type="doi">10.1016/j.ejmech.2017.10.075</pub-id><pub-id pub-id-type="pmid">29133058</pub-id></element-citation></ref>
<ref id="b48-ol-0-0-10068"><label>48</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Feher</surname><given-names>M</given-names></name><name><surname>Ewing</surname><given-names>T</given-names></name></person-group><article-title>Global or local QSAR: Is there a way out?</article-title><source>QSAR Comb Sci</source><volume>28</volume><fpage>850</fpage><lpage>855</lpage><year>2009</year><pub-id pub-id-type="doi">10.1002/qsar.200860186</pub-id></element-citation></ref>
<ref id="b49-ol-0-0-10068"><label>49</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>He</surname><given-names>Y</given-names></name><name><surname>Zhu</surname><given-names>Q</given-names></name><name><surname>Chen</surname><given-names>M</given-names></name><name><surname>Huang</surname><given-names>Q</given-names></name><name><surname>Wang</surname><given-names>W</given-names></name><name><surname>Li</surname><given-names>Q</given-names></name><name><surname>Huang</surname><given-names>Y</given-names></name><name><surname>Di</surname><given-names>W</given-names></name></person-group><article-title>The changing 50&#x0025; inhibitory concentration (IC50) of cisplatin: A pilot study on the artifacts of the MTT assay and the precise measurement of density-dependent chemoresistance in ovarian cancer</article-title><source>Oncotarget</source><volume>7</volume><fpage>70803</fpage><lpage>70821</lpage><year>2016</year><pub-id pub-id-type="doi">10.18632/oncotarget.12223</pub-id><pub-id pub-id-type="pmid">27683123</pub-id></element-citation></ref>
<ref id="b50-ol-0-0-10068"><label>50</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sebaugh</surname><given-names>JL</given-names></name></person-group><article-title>Guidelines for accurate EC50/IC50 estimation</article-title><source>Pharm Stat</source><volume>10</volume><fpage>128</fpage><lpage>134</lpage><year>2011</year><pub-id pub-id-type="doi">10.1002/pst.426</pub-id><pub-id pub-id-type="pmid">22328315</pub-id></element-citation></ref>
<ref id="b51-ol-0-0-10068"><label>51</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kryshchyshyn</surname><given-names>A</given-names></name><name><surname>Devinyak</surname><given-names>O</given-names></name><name><surname>Kaminskyy</surname><given-names>D</given-names></name><name><surname>Grellier</surname><given-names>P</given-names></name><name><surname>Lesyk</surname><given-names>R</given-names></name></person-group><article-title>Development of predictive QSAR models of 4-thiazolidinones antitrypanosomal activity using modern machine learning algorithms</article-title><source>Mol Inform</source><volume>37</volume><fpage>e1700078</fpage><year>2018</year><pub-id pub-id-type="doi">10.1002/minf.201700078</pub-id><pub-id pub-id-type="pmid">29134756</pub-id></element-citation></ref>
<ref id="b52-ol-0-0-10068"><label>52</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cortes-Ciriano</surname><given-names>I</given-names></name><name><surname>Bender</surname><given-names>A</given-names></name><name><surname>Malliavin</surname><given-names>TE</given-names></name></person-group><article-title>Comparing the influence of simulated experimental errors on 12 machine learning algorithms in bioactivity modeling using 12 diverse data sets</article-title><source>J Chem Inf Model</source><volume>55</volume><fpage>1413</fpage><lpage>1425</lpage><year>2015</year><pub-id pub-id-type="doi">10.1021/acs.jcim.5b00101</pub-id><pub-id pub-id-type="pmid">26038978</pub-id></element-citation></ref>
<ref id="b53-ol-0-0-10068"><label>53</label><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Dearden</surname><given-names>JC</given-names></name></person-group><article-title>The use of topological indices in QSAR and QSPR modeling</article-title><source>Advances in QSAR modeling</source><person-group person-group-type="editor"><name><surname>Roy</surname><given-names>K</given-names></name></person-group><volume>24</volume><publisher-name>Springer International Publishing</publisher-name><publisher-loc>Cham</publisher-loc><fpage>57</fpage><lpage>88</lpage><year>2017</year><pub-id pub-id-type="doi">10.1007/978-3-319-56850-8_2</pub-id></element-citation></ref>
<ref id="b54-ol-0-0-10068"><label>54</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Karbakhsh</surname><given-names>R</given-names></name><name><surname>Sabet</surname><given-names>R</given-names></name></person-group><article-title>Application of different chemometric tools in QSAR study of azolo-adamantanes against influenza A virus</article-title><source>Res Pharm Sci</source><volume>6</volume><fpage>23</fpage><lpage>33</lpage><year>2011</year><pub-id pub-id-type="pmid">22049275</pub-id></element-citation></ref>
<ref id="b55-ol-0-0-10068"><label>55</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Prachayasittikul</surname><given-names>V</given-names></name><name><surname>Pingaew</surname><given-names>R</given-names></name><name><surname>Anuwongcharoen</surname><given-names>N</given-names></name><name><surname>Worachartcheewan</surname><given-names>A</given-names></name><name><surname>Nantasenamat</surname><given-names>C</given-names></name><name><surname>Prachayasittikul</surname><given-names>S</given-names></name><name><surname>Ruchirawat</surname><given-names>S</given-names></name><name><surname>Prachayasittikul</surname><given-names>V</given-names></name></person-group><article-title>Discovery of novel 1,2,3-triazole derivatives as anticancer agents using QSAR and in silico structural modification</article-title><source>Springerplus</source><volume>4</volume><fpage>571</fpage><year>2015</year><pub-id pub-id-type="doi">10.1186/s40064-015-1352-5</pub-id><pub-id pub-id-type="pmid">26543706</pub-id></element-citation></ref>
<ref id="b56-ol-0-0-10068"><label>56</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fereidoonnezhad</surname><given-names>M</given-names></name><name><surname>Faghih</surname><given-names>Z</given-names></name><name><surname>Mojaddami</surname><given-names>A</given-names></name><name><surname>Rezaei</surname><given-names>Z</given-names></name><name><surname>Sakhteman</surname><given-names>A</given-names></name></person-group><article-title>A comparative QSAR analysis, molecular docking and PLIF studies of some N-arylphenyl-2, 2-dichloroacetamide analogues as anticancer agents</article-title><source>Iran J Pharm Res</source><volume>16</volume><fpage>981</fpage><lpage>998</lpage><year>2017</year><pub-id pub-id-type="pmid">29535790</pub-id></element-citation></ref>
<ref id="b57-ol-0-0-10068"><label>57</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Edraki</surname><given-names>N</given-names></name><name><surname>Das</surname><given-names>U</given-names></name><name><surname>Hemateenejad</surname><given-names>B</given-names></name><name><surname>Dimmock</surname><given-names>JR</given-names></name><name><surname>Miri</surname><given-names>R</given-names></name></person-group><article-title>Comparative QSAR analysis of 3,5-bis (arylidene)-4-piperidone derivatives: The development of predictive cytotoxicity models</article-title><source>Iran J Pharm Res</source><volume>15</volume><fpage>425</fpage><lpage>437</lpage><year>2016</year><pub-id pub-id-type="pmid">27642313</pub-id></element-citation></ref>
<ref id="b58-ol-0-0-10068"><label>58</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Akbari</surname><given-names>S</given-names></name><name><surname>Zebardast</surname><given-names>T</given-names></name><name><surname>Zarghi</surname><given-names>A</given-names></name><name><surname>Hajimahdi</surname><given-names>Z</given-names></name></person-group><article-title>QSAR modeling of COX-2 inhibitory activity of some dihydropyridine and hydroquinoline derivatives using multiple linear regression (MLR) method</article-title><source>Iran J Pharm Res</source><volume>16</volume><fpage>525</fpage><lpage>532</lpage><year>2017</year><pub-id pub-id-type="pmid">28979307</pub-id></element-citation></ref>
<ref id="b59-ol-0-0-10068"><label>59</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fassihi</surname><given-names>A</given-names></name><name><surname>Sabet</surname><given-names>R</given-names></name></person-group><article-title>QSAR study of p56(lck) protein tyrosine kinase inhibitory activity of flavonoid derivatives using MLR and GA-PLS</article-title><source>Int J Mol Sci</source><volume>9</volume><fpage>1876</fpage><lpage>1892</lpage><year>2008</year><pub-id pub-id-type="doi">10.3390/ijms9091876</pub-id><pub-id pub-id-type="pmid">19325836</pub-id></element-citation></ref>
<ref id="b60-ol-0-0-10068"><label>60</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rojas</surname><given-names>C</given-names></name><name><surname>Todeschini</surname><given-names>R</given-names></name><name><surname>Ballabio</surname><given-names>D</given-names></name><name><surname>Mauri</surname><given-names>A</given-names></name><name><surname>Consonni</surname><given-names>V</given-names></name><name><surname>Tripaldi</surname><given-names>P</given-names></name><name><surname>Grisoni</surname><given-names>F</given-names></name></person-group><article-title>A QSTR-based expert system to predict sweetness of molecules</article-title><source>Front Chem</source><volume>5</volume><fpage>53</fpage><year>2017</year><pub-id pub-id-type="doi">10.3389/fchem.2017.00053</pub-id><pub-id pub-id-type="pmid">28791285</pub-id></element-citation></ref>
<ref id="b61-ol-0-0-10068"><label>61</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mohanapriya</surname><given-names>A</given-names></name><name><surname>Achuthan</surname><given-names>D</given-names></name></person-group><article-title>Comparative QSAR analysis of cyclo-oxygenase2 inhibiting drugs</article-title><source>Bioinformation</source><volume>8</volume><fpage>353</fpage><lpage>358</lpage><year>2012</year><pub-id pub-id-type="doi">10.6026/97320630008353.htm</pub-id><pub-id pub-id-type="pmid">22570515</pub-id></element-citation></ref>
<ref id="b62-ol-0-0-10068"><label>62</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chavan</surname><given-names>S</given-names></name><name><surname>Nicholls</surname><given-names>IA</given-names></name><name><surname>Karlsson</surname><given-names>BC</given-names></name><name><surname>Rosengren</surname><given-names>AM</given-names></name><name><surname>Ballabio</surname><given-names>D</given-names></name><name><surname>Consonni</surname><given-names>V</given-names></name><name><surname>Todeschini</surname><given-names>R</given-names></name></person-group><article-title>Towards global QSAR model building for acute toxicity: Munro database case study</article-title><source>Int J Mol Sci</source><volume>15</volume><fpage>18162</fpage><lpage>18174</lpage><year>2014</year><pub-id pub-id-type="doi">10.3390/ijms151018162</pub-id><pub-id pub-id-type="pmid">25302621</pub-id></element-citation></ref>
<ref id="b63-ol-0-0-10068"><label>63</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wittwer</surname><given-names>MB</given-names></name><name><surname>Zur</surname><given-names>AA</given-names></name><name><surname>Khuri</surname><given-names>N</given-names></name><name><surname>Kido</surname><given-names>Y</given-names></name><name><surname>Kosaka</surname><given-names>A</given-names></name><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Morrissey</surname><given-names>KM</given-names></name><name><surname>Sali</surname><given-names>A</given-names></name><name><surname>Huang</surname><given-names>Y</given-names></name><name><surname>Giacomini</surname><given-names>KM</given-names></name></person-group><article-title>Discovery of potent, selective multidrug and toxin extrusion transporter 1 (MATE1, SLC47A1) inhibitors through prescription drug profiling and computational modeling</article-title><source>J Med Chem</source><volume>56</volume><fpage>781</fpage><lpage>795</lpage><year>2013</year><pub-id pub-id-type="doi">10.1021/jm301302s</pub-id><pub-id pub-id-type="pmid">23241029</pub-id></element-citation></ref>
<ref id="b64-ol-0-0-10068"><label>64</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname><given-names>Y</given-names></name><name><surname>Chen</surname><given-names>N</given-names></name><name><surname>Wang</surname><given-names>Q</given-names></name><name><surname>Zheng</surname><given-names>H</given-names></name><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Jia</surname><given-names>S</given-names></name><name><surname>Dong</surname><given-names>L</given-names></name><name><surname>Feng</surname><given-names>D</given-names></name></person-group><article-title>Docking analysis and multidimensional hybrid QSAR model of 1,4-benzodiazepine-2,5-diones as HDM2 antagonists</article-title><source>Iran J Pharm Res</source><volume>11</volume><fpage>807</fpage><lpage>830</lpage><year>2012</year><pub-id pub-id-type="pmid">24250508</pub-id></element-citation></ref>
<ref id="b65-ol-0-0-10068"><label>65</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sharma</surname><given-names>BK</given-names></name><name><surname>Singh</surname><given-names>P</given-names></name><name><surname>Pilania</surname><given-names>P</given-names></name><name><surname>Shekhawat</surname><given-names>M</given-names></name><name><surname>Prabhakar</surname><given-names>YS</given-names></name></person-group><article-title>QSAR of 2-(4-methylsulphonylphenyl) pyrimidine derivatives as cyclooxygenase-2 inhibitors: Simple structural fragments as potential modulators of activity</article-title><source>J Enzyme Inhib Med Chem</source><volume>27</volume><fpage>249</fpage><lpage>260</lpage><year>2012</year><pub-id pub-id-type="doi">10.3109/14756366.2011.587414</pub-id><pub-id pub-id-type="pmid">21679051</pub-id></element-citation></ref>
<ref id="b66-ol-0-0-10068"><label>66</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sharma</surname><given-names>BK</given-names></name><name><surname>Verma</surname><given-names>S</given-names></name><name><surname>Prabhakar</surname><given-names>YS</given-names></name></person-group><article-title>Topological and physicochemical characteristics of 1,2,3,4-Tetra-hydroacridin-9(10H)-ones and their antimalarial profiles: A composite insight to the structure-activity relation</article-title><source>Curr Comput Aided Drug Des</source><volume>9</volume><fpage>317</fpage><lpage>335</lpage><year>2013</year><pub-id pub-id-type="doi">10.2174/15734099113099990017</pub-id><pub-id pub-id-type="pmid">24010931</pub-id></element-citation></ref>
<ref id="b67-ol-0-0-10068"><label>67</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rasouli</surname><given-names>Y</given-names></name><name><surname>Davood</surname><given-names>A</given-names></name></person-group><article-title>Hybrid Docking - QSAR studies of 1,4-dihydropyridine-3, 5-dicarboxamides as potential antitubercular agents</article-title><source>Curr Comput Aided Drug Des</source><volume>14</volume><fpage>35</fpage><lpage>53</lpage><year>2018</year><pub-id pub-id-type="doi">10.2174/1573409913666170426154045</pub-id><pub-id pub-id-type="pmid">28462696</pub-id></element-citation></ref>
<ref id="b68-ol-0-0-10068"><label>68</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Li</surname><given-names>H</given-names></name><name><surname>Panwar</surname><given-names>B</given-names></name><name><surname>Omenn</surname><given-names>GS</given-names></name><name><surname>Guan</surname><given-names>Y</given-names></name></person-group><article-title>Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features</article-title><source>Gigascience</source><volume>7</volume><fpage>7</fpage><year>2018</year><pub-id pub-id-type="doi">10.1093/gigascience/gix127</pub-id></element-citation></ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-ol-0-0-10068" position="float">
<label>Figure 1.</label>
<caption><p>Heat map depicting the chemical diversity of the substances used in our study, based on the Gower distance. The left column shows their activity (active or inactive), whereas in the heat map proper darker regions correspond to higher dissimilarity and whiter to lower dissimilarity. The density plot shows the distribution of the (scaled) Gower distances (dissimilarity).</p></caption>
<graphic xlink:href="ol-17-05-4188-g00.jpg"/>
</fig>
<fig id="f2-ol-0-0-10068" position="float">
<label>Figure 2.</label>
<caption><p>Distribution of the two data sets (learning, n=316 and external, n=106) in bi-dimensional chemical space (molecular weight and atomic LogP). The triangles correspond to the training data set, whereas the circles to the test.</p></caption>
<graphic xlink:href="ol-17-05-4188-g01.jpg"/>
</fig>
<table-wrap id="tI-ol-0-0-10068" position="float">
<label>Table I.</label>
<caption><p>Performance of selected classification models with PPV higher than 75&#x0025; for the 10-fold nested cross-validation.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Models</th>
<th align="center" valign="bottom">Specificity</th>
<th align="center" valign="bottom">Sensitivity</th>
<th align="center" valign="bottom">PPV</th>
<th align="center" valign="bottom">Balanced accuracy</th>
<th align="center" valign="bottom">MMCE</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Topological descriptors-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9374</td>
<td align="center" valign="top">0.3583</td>
<td align="center" valign="top">0.8424</td>
<td align="center" valign="top">0.6479</td>
<td align="center" valign="top">0.3022</td>
</tr>
<tr>
<td align="left" valign="top">Topological descriptors-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9298</td>
<td align="center" valign="top">0.3628</td>
<td align="center" valign="top">0.7964</td>
<td align="center" valign="top">0.6463</td>
<td align="center" valign="top">0.3105</td>
</tr>
<tr>
<td align="left" valign="top">Topological descriptors-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.9148</td>
<td align="center" valign="top">0.5752</td>
<td align="center" valign="top">0.8749</td>
<td align="center" valign="top">0.745</td>
<td align="center" valign="top">0.2548</td>
</tr>
<tr>
<td align="left" valign="top">Topological descriptors-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), smote</td>
<td align="center" valign="top">0.8946</td>
<td align="center" valign="top">0.499</td>
<td align="center" valign="top">0.8158</td>
<td align="center" valign="top">0.6968</td>
<td align="center" valign="top">0.3086</td>
</tr>
<tr>
<td align="left" valign="top">Walk and path-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9465</td>
<td align="center" valign="top">0.285</td>
<td align="center" valign="top">0.7587</td>
<td align="center" valign="top">0.6158</td>
<td align="center" valign="top">0.3231</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9486</td>
<td align="center" valign="top">0.3434</td>
<td align="center" valign="top">0.8368</td>
<td align="center" valign="top">0.646</td>
<td align="center" valign="top">0.3003</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9685</td>
<td align="center" valign="top">0.3448</td>
<td align="center" valign="top">0.8848</td>
<td align="center" valign="top">0.6566</td>
<td align="center" valign="top">0.2878</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.9022</td>
<td align="center" valign="top">0.634</td>
<td align="center" valign="top">0.8715</td>
<td align="center" valign="top">0.7681</td>
<td align="center" valign="top">0.2319</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), smote</td>
<td align="center" valign="top">0.9023</td>
<td align="center" valign="top">0.5438</td>
<td align="center" valign="top">0.851</td>
<td align="center" valign="top">0.723</td>
<td align="center" valign="top">0.2776</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-BST (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), smote</td>
<td align="center" valign="top">0.78</td>
<td align="center" valign="top">0.7536</td>
<td align="center" valign="top">0.7803</td>
<td align="center" valign="top">0.7668</td>
<td align="center" valign="top">0.2344</td>
</tr>
<tr>
<td align="left" valign="top">2D-autocorrelation-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.927</td>
<td align="center" valign="top">0.3414</td>
<td align="center" valign="top">0.776</td>
<td align="center" valign="top">0.6342</td>
<td align="center" valign="top">0.3063</td>
</tr>
<tr>
<td align="left" valign="top">2D-autocorrelation-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9687</td>
<td align="center" valign="top">0.3005</td>
<td align="center" valign="top">0.8707</td>
<td align="center" valign="top">0.6346</td>
<td align="center" valign="top">0.3063</td>
</tr>
<tr>
<td align="left" valign="top">2D-autocorrelation-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), over</td>
<td align="center" valign="top">0.9453</td>
<td align="center" valign="top">0.611</td>
<td align="center" valign="top">0.9201</td>
<td align="center" valign="top">0.7782</td>
<td align="center" valign="top">0.2289</td>
</tr>
<tr>
<td align="left" valign="top">2D-autocorrelation-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), smote</td>
<td align="center" valign="top">0.9174</td>
<td align="center" valign="top">0.4858</td>
<td align="center" valign="top">0.8583</td>
<td align="center" valign="top">0.7016</td>
<td align="center" valign="top">0.2993</td>
</tr>
<tr>
<td align="left" valign="top">Burden eigenvalues-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.941</td>
<td align="center" valign="top">0.3373</td>
<td align="center" valign="top">0.7943</td>
<td align="center" valign="top">0.6391</td>
<td align="center" valign="top">0.3063</td>
</tr>
<tr>
<td align="left" valign="top">Burden eigenvalues-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), over</td>
<td align="center" valign="top">0.8803</td>
<td align="center" valign="top">0.6373</td>
<td align="center" valign="top">0.8417</td>
<td align="center" valign="top">0.7588</td>
<td align="center" valign="top">0.2427</td>
</tr>
<tr>
<td align="left" valign="top">Burden eigenvalues-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), smote</td>
<td align="center" valign="top">0.8445</td>
<td align="center" valign="top">0.6265</td>
<td align="center" valign="top">0.8057</td>
<td align="center" valign="top">0.7355</td>
<td align="center" valign="top">0.2641</td>
</tr>
<tr>
<td align="left" valign="top">P-VSA-like-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9327</td>
<td align="center" valign="top">0.3528</td>
<td align="center" valign="top">0.7825</td>
<td align="center" valign="top">0.6428</td>
<td align="center" valign="top">0.3058</td>
</tr>
<tr>
<td align="left" valign="top">P-VSA-like-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9332</td>
<td align="center" valign="top">0.3716</td>
<td align="center" valign="top">0.7996</td>
<td align="center" valign="top">0.6524</td>
<td align="center" valign="top">0.2967</td>
</tr>
<tr>
<td align="left" valign="top">P-VSA-like-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), over</td>
<td align="center" valign="top">0.9149</td>
<td align="center" valign="top">0.6159</td>
<td align="center" valign="top">0.8891</td>
<td align="center" valign="top">0.7654</td>
<td align="center" valign="top">0.2369</td>
</tr>
<tr>
<td align="left" valign="top">P-VSA-like-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), smote</td>
<td align="center" valign="top">0.8919</td>
<td align="center" valign="top">0.5541</td>
<td align="center" valign="top">0.8273</td>
<td align="center" valign="top">0.723</td>
<td align="center" valign="top">0.283</td>
</tr>
<tr>
<td align="left" valign="top">Eta indices-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9384</td>
<td align="center" valign="top">0.3807</td>
<td align="center" valign="top">0.8394</td>
<td align="center" valign="top">0.6596</td>
<td align="center" valign="top">0.2872</td>
</tr>
<tr>
<td align="left" valign="top">Edge adjacency-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9412</td>
<td align="center" valign="top">0.3453</td>
<td align="center" valign="top">0.8242</td>
<td align="center" valign="top">0.6432</td>
<td align="center" valign="top">0.307</td>
</tr>
<tr>
<td align="left" valign="top">Edge adjacency-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9301</td>
<td align="center" valign="top">0.3652</td>
<td align="center" valign="top">0.8006</td>
<td align="center" valign="top">0.6477</td>
<td align="center" valign="top">0.3038</td>
</tr>
<tr>
<td align="left" valign="top">Edge adjacency-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.9031</td>
<td align="center" valign="top">0.6477</td>
<td align="center" valign="top">0.8635</td>
<td align="center" valign="top">0.7754</td>
<td align="center" valign="top">0.2239</td>
</tr>
<tr>
<td align="left" valign="top">Edge adjacency-SVM (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.7663</td>
<td align="center" valign="top">0.7113</td>
<td align="center" valign="top">0.7519</td>
<td align="center" valign="top">0.7388</td>
<td align="center" valign="top">0.2696</td>
</tr>
<tr>
<td align="left" valign="top">Global-BST (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.793</td>
<td align="center" valign="top">0.8137</td>
<td align="center" valign="top">0.7899</td>
<td align="center" valign="top">0.8034</td>
<td align="center" valign="top">0.1994</td>
</tr>
<tr>
<td align="left" valign="top">Global-BST (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), smote</td>
<td align="center" valign="top">0.7974</td>
<td align="center" valign="top">0.7957</td>
<td align="center" valign="top">0.7927</td>
<td align="center" valign="top">0.7966</td>
<td align="center" valign="top">0.202</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn1-ol-0-0-10068"><p>RF, random forest classifier; BST, gradient boosting classifier; SVM, support vector machines; PPV, positive predictive value. Numbers in brackets indicate the subset of features selected by the different feature selection algorithms (1-random forest importance and information gain; 2-symmetrical uncertainty); over denotes the training set balanced through oversampling; smote denotes the training set balanced through the smote technique (synthetic minority oversampling technique). The first term in the name of each model indicates the block of descriptors used for its building.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="tII-ol-0-0-10068" position="float">
<label>Table II.</label>
<caption><p>Performance of selected classification models with PPV higher than 75&#x0025; on the independent data set.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Models</th>
<th align="center" valign="bottom">Specificity</th>
<th align="center" valign="bottom">Sensitivity</th>
<th align="center" valign="bottom">PPV</th>
<th align="center" valign="bottom">Balanced accuracy</th>
<th align="center" valign="bottom">MMCE</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Topological descriptors-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9194</td>
<td align="center" valign="top">0.5</td>
<td align="center" valign="top">0.8148</td>
<td align="center" valign="top">0.7097</td>
<td align="center" valign="top">0.2547</td>
</tr>
<tr>
<td align="left" valign="top">Topological descriptors-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9194</td>
<td align="center" valign="top">0.5227</td>
<td align="center" valign="top">0.8214</td>
<td align="center" valign="top">0.721</td>
<td align="center" valign="top">0.2453</td>
</tr>
<tr>
<td align="left" valign="top">Topological descriptors-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.9355</td>
<td align="center" valign="top">0.5682</td>
<td align="center" valign="top">0.8621</td>
<td align="center" valign="top">0.7518</td>
<td align="center" valign="top">0.217</td>
</tr>
<tr>
<td align="left" valign="top">Topological descriptors-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), smote</td>
<td align="center" valign="top">0.9516</td>
<td align="center" valign="top">0.5909</td>
<td align="center" valign="top">0.8966</td>
<td align="center" valign="top">0.7713</td>
<td align="center" valign="top">0.1981</td>
</tr>
<tr>
<td align="left" valign="top">Walk and path-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9516</td>
<td align="center" valign="top">0.2727</td>
<td align="center" valign="top">0.8</td>
<td align="center" valign="top">0.6122</td>
<td align="center" valign="top">0.3302</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0.5</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0.75</td>
<td align="center" valign="top">0.2075</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9839</td>
<td align="center" valign="top">0.5227</td>
<td align="center" valign="top">0.9583</td>
<td align="center" valign="top">0.7533</td>
<td align="center" valign="top">0.2076</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0.5227</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0.7614</td>
<td align="center" valign="top">0.1981</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), smote</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0.5682</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">0.7841</td>
<td align="center" valign="top">0.1792</td>
</tr>
<tr>
<td align="left" valign="top">Information indices-BST (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), smote</td>
<td align="center" valign="top">0.9355</td>
<td align="center" valign="top">0.75</td>
<td align="center" valign="top">0.8919</td>
<td align="center" valign="top">0.8427</td>
<td align="center" valign="top">0.1415</td>
</tr>
<tr>
<td align="left" valign="top">2D-autocorrelation-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9355</td>
<td align="center" valign="top">0.3864</td>
<td align="center" valign="top">0.8095</td>
<td align="center" valign="top">0.6609</td>
<td align="center" valign="top">0.2924</td>
</tr>
<tr>
<td align="left" valign="top">2D-autocorrelation-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9677</td>
<td align="center" valign="top">0.4091</td>
<td align="center" valign="top">0.9</td>
<td align="center" valign="top">0.6884</td>
<td align="center" valign="top">0.2642</td>
</tr>
<tr>
<td align="left" valign="top">2D-autocorrelation-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), over</td>
<td align="center" valign="top">0.9032</td>
<td align="center" valign="top">0.5</td>
<td align="center" valign="top">0.7857</td>
<td align="center" valign="top">0.7016</td>
<td align="center" valign="top">0.2642</td>
</tr>
<tr>
<td align="left" valign="top">2D-autocorrelation-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), smote</td>
<td align="center" valign="top">0.9194</td>
<td align="center" valign="top">0.4773</td>
<td align="center" valign="top">0.8077</td>
<td align="center" valign="top">0.6983</td>
<td align="center" valign="top">0.2642</td>
</tr>
<tr>
<td align="left" valign="top">Burden eigenvalues-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9516</td>
<td align="center" valign="top">0.4773</td>
<td align="center" valign="top">0.875</td>
<td align="center" valign="top">0.7144</td>
<td align="center" valign="top">0.2453</td>
</tr>
<tr>
<td align="left" valign="top">Burden eigenvalues-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), over</td>
<td align="center" valign="top">0.9516</td>
<td align="center" valign="top">0.5909</td>
<td align="center" valign="top">0.8966</td>
<td align="center" valign="top">0.7713</td>
<td align="center" valign="top">0.1981</td>
</tr>
<tr>
<td align="left" valign="top">Burden eigenvalues-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), smote</td>
<td align="center" valign="top">0.9355</td>
<td align="center" valign="top">0.5682</td>
<td align="center" valign="top">0.8621</td>
<td align="center" valign="top">0.7518</td>
<td align="center" valign="top">0.217</td>
</tr>
<tr>
<td align="left" valign="top">P-VSA-like-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9783</td>
<td align="center" valign="top">0.6562</td>
<td align="center" valign="top">0.9545</td>
<td align="center" valign="top">0.8173</td>
<td align="center" valign="top">0.1538</td>
</tr>
<tr>
<td align="left" valign="top">P-VSA-like-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9783</td>
<td align="center" valign="top">0.6875</td>
<td align="center" valign="top">0.9565</td>
<td align="center" valign="top">0.8329</td>
<td align="center" valign="top">0.141</td>
</tr>
<tr>
<td align="left" valign="top">P-VSA-like-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), over</td>
<td align="center" valign="top">0.9783</td>
<td align="center" valign="top">0.7812</td>
<td align="center" valign="top">0.9615</td>
<td align="center" valign="top">0.8798</td>
<td align="center" valign="top">0.1026</td>
</tr>
<tr>
<td align="left" valign="top">P-VSA-like-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>), smote</td>
<td align="center" valign="top">0.9783</td>
<td align="center" valign="top">0.9062</td>
<td align="center" valign="top">0.9667</td>
<td align="center" valign="top">0.9423</td>
<td align="center" valign="top">0.0513</td>
</tr>
<tr>
<td align="left" valign="top">Eta indices-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9032</td>
<td align="center" valign="top">0.4318</td>
<td align="center" valign="top">0.76</td>
<td align="center" valign="top">0.6675</td>
<td align="center" valign="top">0.2924</td>
</tr>
<tr>
<td align="left" valign="top">Edge adjacency-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>)</td>
<td align="center" valign="top">0.9839</td>
<td align="center" valign="top">0.4545</td>
<td align="center" valign="top">0.9524</td>
<td align="center" valign="top">0.7192</td>
<td align="center" valign="top">0.2358</td>
</tr>
<tr>
<td align="left" valign="top">Edge adjacency-RF (<xref rid="b2-ol-0-0-10068" ref-type="bibr">2</xref>)</td>
<td align="center" valign="top">0.9839</td>
<td align="center" valign="top">0.3864</td>
<td align="center" valign="top">0.9444</td>
<td align="center" valign="top">0.6851</td>
<td align="center" valign="top">0.2642</td>
</tr>
<tr>
<td align="left" valign="top">Edge adjacency-RF (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.9516</td>
<td align="center" valign="top">0.4545</td>
<td align="center" valign="top">0.8696</td>
<td align="center" valign="top">0.7031</td>
<td align="center" valign="top">0.2547</td>
</tr>
<tr>
<td align="left" valign="top">Edge adjacency-SVM (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.9023</td>
<td align="center" valign="top">0.6364</td>
<td align="center" valign="top">0.8235</td>
<td align="center" valign="top">0.7698</td>
<td align="center" valign="top">0.2076</td>
</tr>
<tr>
<td align="left" valign="top">Global-BST (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), over</td>
<td align="center" valign="top">0.8871</td>
<td align="center" valign="top">0.9318</td>
<td align="center" valign="top">0.8542</td>
<td align="center" valign="top">0.9095</td>
<td align="center" valign="top">0.0943</td>
</tr>
<tr>
<td align="left" valign="top">Global-BST (<xref rid="b1-ol-0-0-10068" ref-type="bibr">1</xref>), smote</td>
<td align="center" valign="top">0.9032</td>
<td align="center" valign="top">0.9318</td>
<td align="center" valign="top">0.8723</td>
<td align="center" valign="top">0.9175</td>
<td align="center" valign="top">0.0849</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn2-ol-0-0-10068"><p>RF, random forest classifier; BST, gradient boosting classifier; SVM, support vector machines; PPV, positive predictive value. Numbers in brackets indicate the subset of features selected by the different feature selection algorithms (1-random forest importance and information gain; 2-symmetrical uncertainty); over, denotes the training set balanced through oversampling; smote, denotes the training set balanced through the smote technique (synthetic minority oversampling technique). The first term in the name of each model indicates the block of descriptors used for its building.</p></fn>
</table-wrap-foot>
</table-wrap>
</floats-group>
</article>
