<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xml:lang="en" article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
<?release-delay 0|0?>
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">OL</journal-id>
<journal-title-group>
<journal-title>Oncology Letters</journal-title>
</journal-title-group>
<issn pub-type="ppub">1792-1074</issn>
<issn pub-type="epub">1792-1082</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3892/ol.2017.6835</article-id>
<article-id pub-id-type="publisher-id">OL-0-0-6835</article-id>
<article-categories>
<subj-group>
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Classification and survival prediction for early-stage lung adenocarcinoma and squamous cell carcinoma patients</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Tian</surname><given-names>Suyan</given-names></name>
<xref rid="af1-ol-0-0-6835" ref-type="aff">1</xref>
<xref rid="af2-ol-0-0-6835" ref-type="aff">2</xref>
<xref rid="c1-ol-0-0-6835" ref-type="corresp"/></contrib>
</contrib-group>
<aff id="af1-ol-0-0-6835"><label>1</label>Division of Clinical Research, The First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China</aff>
<aff id="af2-ol-0-0-6835"><label>2</label>Center for Applied Statistical Research, School of Mathematics, Jilin University, Changchun, Jilin 130012, P.R. China</aff>
<author-notes>
<corresp id="c1-ol-0-0-6835"><italic>Correspondence to</italic>: Dr Suyan Tian, Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, P.R. China, E-mail: <email>stian@rockefeller.edu</email>; <email>windytian@hotmail.com</email></corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>11</month>
<year>2017</year></pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>08</month>
<year>2017</year></pub-date>
<volume>14</volume>
<issue>5</issue>
<fpage>5464</fpage>
<lpage>5470</lpage>
<history>
<date date-type="received"><day>21</day><month>04</month><year>2017</year></date>
<date date-type="accepted"><day>04</day><month>08</month><year>2017</year></date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; Tian et al.</copyright-statement>
<copyright-year>2017</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivs License</ext-link>, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.</license-p></license>
</permissions>
<abstract>
<p>Non-small cell lung cancer (NSCLC) is a leading cause of cancer-associated mortality worldwide. Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are two primary histological subtypes of NSCLC, accounting for ~70&#x0025; of lung cancer cases. Increasing evidence suggests that AC and SCC differ in the composition of genes and molecular characteristics. Previous research has focused on distinguishing AC from SCC or predicting the NSCLC patient survival rates using gene expression profiles, usually with the aid of a feature selection method. The present study conducted a pre-filtering to identify the genes that have significant expression values and a high connection with other genes in the gene network, and then used the radial coordinate visualization method to identify relevant genes. By applying the proposed procedure to NSCLC data, it was demonstrated that there is a clear segmentation between AC and SCC, however not between patients with a good prognosis and bad prognosis. The focus of discriminating AC and SCC differs from survival prediction and there are almost no overlaps between the two gene signatures. Overall, a supervised learning method is preferred and future studies aiming to identify prognostic gene signatures with an increased prediction efficiency are required.</p>
</abstract>
<kwd-group>
<kwd>non-small cell lung cancer</kwd>
<kwd>GeneRank</kwd>
<kwd>radial coordinate visualization</kwd>
<kwd>prognosis</kwd>
<kwd>connectivity</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Non-small cell lung cancer (NSCLC) is one leading cause of cancer deaths in many countries (<xref rid="b1-ol-0-0-6835" ref-type="bibr">1</xref>). It can be categorized into three major histological subtypes among which adenocarcinoma (AC) and squamous cell carcinoma (SCC) account roughly for 40 and 30&#x0025; of the lung cancer (LC) cases, respectively (<xref rid="b2-ol-0-0-6835" ref-type="bibr">2</xref>). Increasing evidence supports that AC and SCC differ in the composition of genes and molecular characteristics. For instance, Hou <italic>et al</italic> (<xref rid="b3-ol-0-0-6835" ref-type="bibr">3</xref>) found that in contrast to the AC-associated genes are highly enriched to tight junction and cell adhesion molecules, the SCC-associated genes are more correlated to cell communication. Therefore, they are currently regarded as two distinct diseases.</p>
<p>Currently, treatment choices for the NSCLC patients mainly depend on the stage at which cancer was diagnosed regardless of the histological subtype. For example, patients at the stage IA usually undergo surgical resection and rarely prescribe to adjuvant chemotherapy. But the recurrence rates of patients at the same stage of cancer are heterogeneous, making such homogeneous treatment choices implausible. It is becoming critical to evaluate the risk profiles of patients using a reliable molecular/gene signature. Nevertheless, due to the fundamental differences between AC and SCC of NSCLC patients, it is hypothesized that specific genes are related to recurrence/survival rates for each histology subtype (<xref rid="b4-ol-0-0-6835" ref-type="bibr">4</xref>&#x2013;<xref rid="b6-ol-0-0-6835" ref-type="bibr">6</xref>).</p>
<p>To deal with the issue of high dimensionality commonly existing in gene expression profiles, downsizing from thousands of genes to a minimal gene signature with maximal predictive ability is of the essence. In statistics, this process is referred to as feature selection (<xref rid="b7-ol-0-0-6835" ref-type="bibr">7</xref>). Efforts have been devoted to distinguish AC from SCC using gene expression profiles and various feature selection algorithms (<xref rid="b8-ol-0-0-6835" ref-type="bibr">8</xref>&#x2013;<xref rid="b12-ol-0-0-6835" ref-type="bibr">12</xref>), and more recently to identify prognostic markers for each specific subtype (<xref rid="b4-ol-0-0-6835" ref-type="bibr">4</xref>&#x2013;<xref rid="b6-ol-0-0-6835" ref-type="bibr">6</xref>).</p>
<p>Genes are highly correlated and can be grouped into many gene sets correspondingly. Depending on if these group structures are taken into account, a feature selection algorithm may be classified into either a pathway-based or a gene-based method. Studies have demonstrated that compared to its gene-based counterpart, a pathway-based feature selection algorithm in which pathway information is utilized to assist the selection process has a better predictive performance, stability or biological interpretation (<xref rid="b13-ol-0-0-6835" ref-type="bibr">13</xref>&#x2013;<xref rid="b18-ol-0-0-6835" ref-type="bibr">18</xref>). Specifically for the NSCLC applications, several pathway-based feature selection algorithms have been applied to distinguish its major subtypes and/or histological stages (<xref rid="b8-ol-0-0-6835" ref-type="bibr">8</xref>,<xref rid="b11-ol-0-0-6835" ref-type="bibr">11</xref>,<xref rid="b14-ol-0-0-6835" ref-type="bibr">14</xref>).</p>
<p>As a data visualization method, the Radial Coordinate Visualization (RadViz) method (<xref rid="b19-ol-0-0-6835" ref-type="bibr">19</xref>) can display more than two variables in a 2-dimensional projection. It can also be used to search for biologically interesting patterns and select relevant genes highly associated with the phenotype of interest (<xref rid="b9-ol-0-0-6835" ref-type="bibr">9</xref>,<xref rid="b20-ol-0-0-6835" ref-type="bibr">20</xref>). In a Radviz projection, features such as genes are presented as anchor points spaced around the perimeter of a circle while samples are as points inside the circle. Each point (i.e., a sample) is held in place by springs that are attached at the other end to the feature anchors (i.e., genes). The stiffness of each spring is proportional to the sample&#x0027;s corresponding gene expression value and the point ends up at the position where the spring forces for these anchors are in equilibrium. When used for the purpose of feature selection, RadViz may be roughly regarded as a gene-based method since it does not account for any pathway information.</p>
<p>In this article, we first ordered genes using a novel ranking method in bioinformatics-the GeneRank method (<xref rid="b21-ol-0-0-6835" ref-type="bibr">21</xref>) which ranks genes according to not only its expression level but also its connectivity with other genes in the gene-to-gene interaction network, and then we restricted the genes under consideration to those ranked on the top by the GeneRank method and used RadViz to select relevant genes in the restricted search space. The proposed procedure is a combination of the pre-filtering and RadViz, in which the connectivity information is also incorporated. We applied the proposed procedure to a set of NSCLC data to establish diagnostic gene signatures for the classification between AC and SCC and prognostic signatures for the survival prediction of NSCLC patients.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title/>
<sec>
<title>Experimental data</title>
<p>One microarray dataset and one RNA-Seq dataset were included in this study. The microarray data was under the accession number of GSE50081 in the Gene Expression Omnibus (GEO: <uri xlink:href="http://www.ncbi.nlm.nih.gov/geo/">http://www.ncbi.nlm.nih.gov/geo/</uri>) repository. It was hybridized on Affymetrix HGU133 Plus 2.0 chips, including 127 AC and 42 SCC patients. We excluded those patients censored before a 5-year period and then stratified the remaining 133 patients into two categories: high-risk patients who had died and low-risk patients who had survived more than 5 years. The microarray data set was used as the training set to train the final statistical models (i.e., the diagnostic/prognostic signatures).</p>
<p>The RNA-Seq data were downloaded from The Cancer Genome Atlas (TCGA: <uri xlink:href="https://tcga-data.nci.nih.gov/tcga/">https://tcga-data.nci.nih.gov/tcga/</uri>) on August 13, 2014. After restricting the patients to those at early stages and being adjuvant treatment na&#x00EF;ve with survival information, this leaves 70 AC and 55 SCC subjects in this study. In the present study, the RNA-Seq dataset was used as the test set to validate the performance of the resulting diagnostic/prognostic signatures.</p>
</sec>
<sec>
<title>Pre-processing procedures</title>
<p>For the microarray data, the expression values were obtained using the <italic>frma</italic> algorithm (<xref rid="b22-ol-0-0-6835" ref-type="bibr">22</xref>) and normalization across samples was carried out using quantile normalization. The resulting expression values were log<sub>2</sub> transformed and further standardized to have a mean of 0 and a standard deviation of 1 for each gene. For the NSCLC RNA-seq data, Counts-per-million (CPM) values were calculated and log<sub>2</sub> transformed by the R Voom function (<xref rid="b23-ol-0-0-6835" ref-type="bibr">23</xref>). Then the resulting values were standardized as well.</p>
</sec>
<sec>
<title>Statistical analysis</title>
<p>As mentioned in the Introduction section, RadViz is a visualization method that can be used for the purpose of feature selection and classification. In order to obtain a clear and good separation among different classes using several genes, Radviz needs to search over a myriad of possible combinations. This search is tedious. To automatically solve this problem, an approach called VizRank had been proposed by (<xref rid="b24-ol-0-0-6835" ref-type="bibr">24</xref>), which scores the visualization projects according to the degree of class separation and then to find those with the highest scores. In VizRank, features are ranked using signal-to-noise ratio and a subset of the features is randomly chosen favoring features with higher ranks, given such genes convey more information about the classification under investigation. Lastly, for a selected gene subset, VizRank then evaluates exhaustively all possible projections defined by different permutations of feature anchors on the circle to obtain the optimal projection.</p>
</sec>
<sec>
<title>GeneRank</title>
<p>The GeneRank method (<xref rid="b21-ol-0-0-6835" ref-type="bibr">21</xref>) ranks genes on the basis of both genes&#x0027; expression values and their connectivity information. Specifically, it solves the following equation,</p>
<disp-formula>
<alternatives>
<mml:math id="umml1" display="block"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>I</mml:mi><mml:mo>&#x2013;</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="italic">dWD</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2013;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mspace width=".16em" /><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2013;</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mspace width=".16em" /><mml:mtext mathvariant="italic">exp</mml:mtext></mml:mrow></mml:math>
<graphic xlink:href="ol-14-05-5464-g00.tif"/>
</alternatives>
</disp-formula>
<p>here, W denotes the adjacency matrix of genes, and D is a diagonal matrix recording the degrees (i.e., the number of genes to whom the specific gene is connected in the pathway graph) of genes. The gene expression value is represented by exp, and d is a tuning parameter, balancing the influence of a gene&#x0027;s expression value and its connectivity information. Its default value of 0.5 was used in this study. The GeneRank for each gene was calculated using the R pathClass package.</p>
<p>In our proposed procedure, all genes under consideration were firstly ordered according to their GeneRanks. Then upon the first 200, 500, 1,000, and all genes in this list, we used RadViz to select the optimal gene subset with the best VizRank score (the maximum number of genes was set at 8). The proposed procedure is graphically illustrated in <xref rid="f1-ol-0-0-6835" ref-type="fig">Fig. 1</xref>.</p>
</sec>
<sec>
<title>Statistical metrics</title>
<p>To evaluate the performance of a resulting diagnostic signature, two metrics-Generalized Brier Score (GBS), and misclassified error rate-were considered. GBS was defined as (<xref rid="b25-ol-0-0-6835" ref-type="bibr">25</xref>),</p>
<disp-formula>
<alternatives>
<mml:math id="umml2" display="block"><mml:mrow><mml:mtext mathvariant="italic">GBS</mml:mtext><mml:mspace width=".16em" /><mml:mo>=</mml:mo><mml:mspace width=".16em" /><mml:msubsup><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:msubsup><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mrow><mml:mtext mathvariant="italic">ik</mml:mtext></mml:mrow></mml:msub><mml:mo>&#x2013;</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mtext mathvariant="italic">ik</mml:mtext></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:mrow></mml:math>
<graphic xlink:href="ol-14-05-5464-g01.tif"/>
</alternatives>
</disp-formula>
<p>where Y<sub>ik</sub> is an indicator function, indicating whether or not subject <italic>i</italic> (i=1,2, &#x2026;, n) in class <italic>k</italic> (k=1, 2, &#x2026;, K). And p<sub>ik</sub> denotes the calculated probability of subject <italic>i</italic> belonging to class <italic>k</italic>. Of note, we normalized the GBS by the sample size n. As a result, the normalized GBS falls inside [0, 1], with a value closer to 0 indicting a better separation among classes.</p>
<p>For a resulting prognostic signature, we used the C-statistic over the follow-up period (0, &#x03C4;) to evaluate its performance. Specifically, the censoring-adjusted C-statistic is defined by (<xref rid="b26-ol-0-0-6835" ref-type="bibr">26</xref>) as,</p>
<disp-formula>
<alternatives>
<mml:math id="umml3" display="block"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>&#x03C4;</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x003E;</mml:mo><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x003C;</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x003C;</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>
<graphic xlink:href="ol-14-05-5464-g02.tif"/>
</alternatives>
</disp-formula>
<p>where g(X) is the risk score for a subject with predictor vector X. T<sub>i</sub> and T<sub>j</sub> the survival time for patient i and patient j, respectively. C-statistic can be estimated using R package survAUC, with a value closer to 1 indicating a better performance.</p>
<p>Additionally, we fitted a multiple Cox regression model using the selected genes as covariates and calculated the risk scores for each patient using the estimated coefficients in this model. Setting the mean value of those risk scores as a threshold, we classified patients into a low-risk group or a high-risk group. We obtained Kaplan-Meier curves using the resulting risk scores, and compared the two curves using log-rank tests. P-value of the log-rank test was the other metric used to compare the performance of resulting prognostic signatures.</p>
</sec>
<sec>
<title>Statistical language and packages</title>
<p>Statistical analysis including SVM, GeneRank, and performance metric calculation was carried out in the R language version 3.2 (<uri xlink:href="http://www.r-project.org">http://www.r-project.org</uri>). RadViz/VizRank analysis was conducted using the Orange software, version 2.7 (<uri xlink:href="http://www.orange.biolab.si">http://www.orange.biolab.si</uri>).</p>
</sec>
</sec>
</sec>
<sec sec-type="results">
<title>Results</title>
<p>We applied the proposed procedure to the NSCLC application and obtained two sets of gene signatures-one for the AC/SCC segmentation and the other for high/low risk segmentation, being herein referred to as the diagnostic signature and the prognostic signature, respectively. We consider four scenarios: the genes under consideration were restricted to the first 200, or 500, or 1,000 highest-ranked genes and then were all 1,952 genes in the last scenario. The corresponding RadViz projections with optimal gene subsets are presented in <xref rid="f2-ol-0-0-6835" ref-type="fig">Figs. 2</xref> and <xref rid="f3-ol-0-0-6835" ref-type="fig">3</xref>, from which we observed that the final diagnostic and prognostic gene signatures were barely overlapped in all scenarios. As discussed previously (<xref rid="b5-ol-0-0-6835" ref-type="bibr">5</xref>), it is unsurprising to observe no or only limited overlaps between the diagnostic signatures and the prognostic signatures since the outcomes under investigation for these two sets of signatures differ in nature.</p>
<p>The performance statistics of the resulting diagnostic signatures for both the training set and the test set are presented in <xref rid="tI-ol-0-0-6835" ref-type="table">Table I</xref>. Similarly, the performance statistics of the resulting prognostic signatures for both the training set and the test set are presented in <xref rid="tII-ol-0-0-6835" ref-type="table">Table II</xref>. There are no significant differences in terms of predictive performance for either diagnostic signatures or prognostic signatures under these four scenarios, indicating a prescreening step to downsize the genes under consideration to those that are important in terms of both pathway connectivity and expression differences shall not deteriorate the predictive performance of resulting final signatures. Even though no huge differences among those signatures exist, the signatures constructed with the first 1,000 genes outperform slightly to the signatures under the other scenarios, suggesting 1,000 is the optimal cutoff for the number of genes under consideration in this study. The heatmaps of the 8-gene diagnostic signature and the 8-gene prognostic signature under the first 1,000-gene scenario are shown in <xref rid="f4-ol-0-0-6835" ref-type="fig">Fig. 4</xref>. In consistent to the previous observations, there existed a clear separation between AC and SCC samples but not so between the high-risk and the low-risk patients. Instead, using hierarchical clustering (as shown in <xref rid="f4-ol-0-0-6835" ref-type="fig">Fig. 4</xref>), the samples may be classified into three clusters-the high-risk patients, the low-risk patients, and those with ambiguous labels.</p>
<p>Furthermore, since it is demonstrated that several genes are adequate to discriminate AC and SCC apart (<xref rid="b9-ol-0-0-6835" ref-type="bibr">9</xref>,<xref rid="b11-ol-0-0-6835" ref-type="bibr">11</xref>), we set the maximum number of genes in the RadViz projections as 3 and redid the selection of relevant genes and the final model fitting. In contrast, previous studies (<xref rid="b5-ol-0-0-6835" ref-type="bibr">5</xref>,<xref rid="b27-ol-0-0-6835" ref-type="bibr">27</xref>) have shown that compared to the diagnostic gene signatures, the identification of prognostic gene signatures is much difficult and thus less than 10 genes might be incapable of separating patients with good prognosis from those with bad prognosis. As a fix to this, we resort to the strategy of using genes with the highest frequencies in the RadViz projections (<xref rid="b9-ol-0-0-6835" ref-type="bibr">9</xref>,<xref rid="b20-ol-0-0-6835" ref-type="bibr">20</xref>). Here the final size is set at 40. The performance statistics for the 40-gene prognostic signature are tabulated in <xref rid="tII-ol-0-0-6835" ref-type="table">Table II</xref> as well.</p>
<p>The gene symbols of these 3-gene diagnostic signatures are presented in <xref rid="f5-ol-0-0-6835" ref-type="fig">Fig. 5</xref>. We found the 3-gene signatures are very stable. While there is one gene (33.3&#x0025;) existing in all these signatures, 3 of these four signatures (75&#x0025;) share 2 common genes (66.7&#x0025;), providing further evidence to support that several gene biomarkers are sufficient to distinguish AC and SCC. For the prognosis analysis, when we increased the size of final models to 40, a better separation between patients with good prognosis and those with bad prognosis has been achieved compared to the 8-gene signatures. But the performance of the 40-gene signatures is still below satisfactory, which may be explained by the following reasons.</p>
<p>First, the patients were stratified into two categories as the high-risk one and the low-risk one on the basis of their survival time. The risk status served as the outcome when training the prognostic signatures. Such an over-simplified stratification might lead to the predictive inferiority of a prognostic signature, as pointed out by (<xref rid="b28-ol-0-0-6835" ref-type="bibr">28</xref>). Considering the RadViz method is incapable of dealing with the time-to-event outcomes, we will definitely replace it with a more novel feature selection algorithm e.g., LASSO and reanalyze this NSCLC dataset in our future research.</p>
<p>Second, in this study we constructed the overall prognostic signature for NSCLC patients without considering their histological subtypes. As mentioned in the Introduction section, there may exist subtype-specific prognostic genes for AC and SCC. Since one major goal of this study is to illustrate the point that the diagnostic and prognostic gene signatures differ dramatically, a homogenous prognostic signature for both AC and SCC is required. Construction of subtype-specific prognostic signatures using either separate survival analysis for each specific subtype or a suitable statistical method such as (<xref rid="b4-ol-0-0-6835" ref-type="bibr">4</xref>,<xref rid="b5-ol-0-0-6835" ref-type="bibr">5</xref>) is warranted, in order to make better prediction and thus to facilitate personalized treatment strategies for NSCLC patients.</p>
<p>Lastly, the gene expression profile alone might not convey all information about the prognosis of NSCLC patients. If this is true, other omics data such as copy number alternation and DNA methylation data need to be integrated in order to provide a better survival prediction for the NSCLC early-stage patients.</p>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>In the present study, we trained on the same data to construct the diagnostic and prognostic gene signatures with the aids of RadViz and SVM. The gene expression profiles may contain valuable information on AC/SCC segmentation, and also valuable information on prognosis. Nevertheless, those informative genes for diagnosis might not be valuable for prognosis, and vice verse. It is unsurprising that the diagnostic signatures and the prognostic signatures share no or limited overlaps, even they are all trained on the same dataset.</p>
<p>With regard to that no significant prognostic gene signatures have been achieved in this study, in the Results section, we listed three reasons to explain why this happened. Given the fact we obtained substantially better C-statistics using the same datasets and Cox-models (unpublished work), we believe the stratification of patients into different risk categories on the basis of their survival time may result in huge information loss. Thus, it is emphasized that such an over-simplification shall be avoided in practice.</p>
<p>Depending on if the membership/label information is taken into account, a machine learning method is classified into either an unsupervised method or a supervised method. Without considering the labels/dependent variables, the information captured by an unsupervised learning method might not be meaningful for both diagnosis and prognosis, let alone there are so many irrelevant and redundant genes in gene expression profiles to blur the signals from those relevant ones, thus the process of variable selection becomes imperative where the outcome/label is always taken into consideration. Therefore, we prefer to a supervised method over an unsupervised learning method.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="b1-ol-0-0-6835"><label>1</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jemal</surname><given-names>A</given-names></name><name><surname>Bray</surname><given-names>F</given-names></name><name><surname>Center</surname><given-names>MM</given-names></name><name><surname>Ferlay</surname><given-names>J</given-names></name><name><surname>Ward</surname><given-names>E</given-names></name><name><surname>Forman</surname><given-names>D</given-names></name></person-group><article-title>Global cancer statistics</article-title><source>CA Cancer J Clin</source><volume>61</volume><fpage>69</fpage><lpage>90</lpage><year>2011</year><pub-id pub-id-type="doi">10.3322/caac.20107</pub-id><pub-id pub-id-type="pmid">21296855</pub-id></element-citation></ref>
<ref id="b2-ol-0-0-6835"><label>2</label><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Lu</surname><given-names>C</given-names></name><name><surname>Onn</surname><given-names>A</given-names></name><name><surname>Vaporciyan</surname><given-names>A</given-names></name><etal/></person-group><chapter-title>78: Cancer of the lung</chapter-title><source>Holland-Frei Cancer Medicine</source><edition>8th</edition><publisher-name>People&#x0027;s Medical Publishing House</publisher-name><year>2010</year></element-citation></ref>
<ref id="b3-ol-0-0-6835"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hou</surname><given-names>J</given-names></name><name><surname>Aerts</surname><given-names>J</given-names></name><name><surname>den Hamer</surname><given-names>B</given-names></name><name><surname>van Ijcken</surname><given-names>W</given-names></name><name><surname>den Bakker</surname><given-names>M</given-names></name><name><surname>Riegman</surname><given-names>P</given-names></name><name><surname>van der Leest</surname><given-names>C</given-names></name><name><surname>van der Spek</surname><given-names>P</given-names></name><name><surname>Foekens</surname><given-names>JA</given-names></name><name><surname>Hoogsteden</surname><given-names>HC</given-names></name><etal/></person-group><article-title>Gene expression-based classification of non-small cell lung carcinomas and survival prediction</article-title><source>PLoS One</source><volume>5</volume><fpage>e10312</fpage><year>2010</year><pub-id pub-id-type="doi">10.1371/journal.pone.0010312</pub-id><pub-id pub-id-type="pmid">20421987</pub-id></element-citation></ref>
<ref id="b4-ol-0-0-6835"><label>4</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname><given-names>S</given-names></name><name><surname>Wang</surname><given-names>C</given-names></name><name><surname>An</surname><given-names>MW</given-names></name></person-group><article-title>Test on existence of histology subtype-specific prognostic signatures among early stage lung adenocarcinoma and squamous cell carcinoma patients using a Cox-model based filter</article-title><source>Biol Direct</source><volume>10</volume><fpage>15</fpage><year>2015</year><pub-id pub-id-type="doi">10.1186/s13062-015-0051-z</pub-id><pub-id pub-id-type="pmid">25887039</pub-id></element-citation></ref>
<ref id="b5-ol-0-0-6835"><label>5</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname><given-names>S</given-names></name></person-group><article-title>Identification of subtype-specific prognostic genes for early-stage lung adenocarcinoma and squamous cell carcinoma patients using an embedded feature selection algorithm</article-title><source>PLoS One</source><volume>10</volume><fpage>e0134630</fpage><year>2015</year><pub-id pub-id-type="doi">10.1371/journal.pone.0134630</pub-id><pub-id pub-id-type="pmid">26226392</pub-id></element-citation></ref>
<ref id="b6-ol-0-0-6835"><label>6</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Skrzypski</surname><given-names>M</given-names></name><name><surname>Dziadziuszko</surname><given-names>R</given-names></name><name><surname>Jassem</surname><given-names>E</given-names></name><name><surname>Szymanowska-Narloch</surname><given-names>A</given-names></name><name><surname>Gulida</surname><given-names>G</given-names></name><name><surname>Rzepko</surname><given-names>R</given-names></name><name><surname>Biernat</surname><given-names>W</given-names></name><name><surname>Taron</surname><given-names>M</given-names></name><name><surname>Jelitto-G&#x00F3;rska</surname><given-names>M</given-names></name><name><surname>Marja&#x0144;ski</surname><given-names>T</given-names></name><etal/></person-group><article-title>Main histologic types of non-small-cell lung cancer differ in expression of prognosis-related genes</article-title><source>Clin Lung Cancer</source><volume>14</volume><fpage>666</fpage><lpage>673.e2</lpage><year>2013</year><pub-id pub-id-type="doi">10.1016/j.cllc.2013.04.010</pub-id><pub-id pub-id-type="pmid">23870818</pub-id></element-citation></ref>
<ref id="b7-ol-0-0-6835"><label>7</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Saeys</surname><given-names>Y</given-names></name><name><surname>Inza</surname><given-names>I</given-names></name><name><surname>Larra&#x00F1;aga</surname><given-names>P</given-names></name></person-group><article-title>A review of feature selection techniques in bioinformatics</article-title><source>Bioinformatics</source><volume>23</volume><fpage>2507</fpage><lpage>2517</lpage><year>2007</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btm344</pub-id><pub-id pub-id-type="pmid">17720704</pub-id></element-citation></ref>
<ref id="b8-ol-0-0-6835"><label>8</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>L</given-names></name><name><surname>Wang</surname><given-names>L</given-names></name><name><surname>Du</surname><given-names>B</given-names></name><name><surname>Wang</surname><given-names>T</given-names></name><name><surname>Tian</surname><given-names>P</given-names></name><name><surname>Tian</surname><given-names>S</given-names></name></person-group><article-title>Classification of non-small cell lung cancer using significance analysis of microarray-gene set reduction algorithm</article-title><source>Biomed Res Int</source><volume>2016</volume><fpage>2491671</fpage><year>2016</year><pub-id pub-id-type="pmid">27446945</pub-id></element-citation></ref>
<ref id="b9-ol-0-0-6835"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>A</given-names></name><name><surname>Wang</surname><given-names>C</given-names></name><name><surname>Wang</surname><given-names>S</given-names></name><name><surname>Li</surname><given-names>L</given-names></name><name><surname>Liu</surname><given-names>Z</given-names></name><name><surname>Tian</surname><given-names>S</given-names></name></person-group><article-title>Visualization-aided classification ensembles discriminate lung adenocarcinoma and squamous cell carcinoma samples using their gene expression profiles</article-title><source>PLoS One</source><volume>9</volume><fpage>e11052</fpage><year>2014</year></element-citation></ref>
<ref id="b10-ol-0-0-6835"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname><given-names>S</given-names></name><name><surname>Su&#x00E1;rez-fari&#x00F1;as</surname><given-names>M</given-names></name></person-group><article-title>Hierarchical-TGDR: Combining biological hierarchy with a regularization method for multi-class classification of lung cancer samples via high-throughput gene-expression data</article-title><source>Syst Biomed</source><volume>4</volume><fpage>e25979</fpage><year>2013</year></element-citation></ref>
<ref id="b11-ol-0-0-6835"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ben-Hamo</surname><given-names>R</given-names></name><name><surname>Boue</surname><given-names>S</given-names></name><name><surname>Martin</surname><given-names>F</given-names></name><name><surname>Talikka</surname><given-names>M</given-names></name><name><surname>Efroni</surname><given-names>S</given-names></name></person-group><article-title>Classification of lung adenocarcinoma and squamous cell carcinoma samples based on their gene expression profile in the sbv IMPROVER diagnostic signature challenge</article-title><source>Syst Biomed</source><volume>1</volume><fpage>83</fpage><lpage>92</lpage><year>2013</year></element-citation></ref>
<ref id="b12-ol-0-0-6835"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Yang</surname><given-names>XY</given-names></name><name><surname>Shi</surname><given-names>WJ</given-names></name></person-group><article-title>Identifying differentially expressed genes and pathways in two types of non-small cell lung cancer: Adenocarcinoma and squamous cell carcinoma</article-title><source>Genet Mol Res</source><volume>13</volume><fpage>95</fpage><lpage>102</lpage><year>2014</year><pub-id pub-id-type="doi">10.4238/2014.January.8.8</pub-id><pub-id pub-id-type="pmid">24446291</pub-id></element-citation></ref>
<ref id="b13-ol-0-0-6835"><label>13</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Johannes</surname><given-names>M</given-names></name><name><surname>Brase</surname><given-names>JC</given-names></name><name><surname>Fr&#x00F6;hlich</surname><given-names>H</given-names></name><name><surname>Gade</surname><given-names>S</given-names></name><name><surname>Gehrmann</surname><given-names>M</given-names></name><name><surname>F&#x00E4;lth</surname><given-names>M</given-names></name><name><surname>S&#x00FC;ltmann</surname><given-names>H</given-names></name><name><surname>Beissbarth</surname><given-names>T</given-names></name></person-group><article-title>Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients</article-title><source>Bioinformatics</source><volume>26</volume><fpage>2136</fpage><lpage>2144</lpage><year>2010</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btq345</pub-id><pub-id pub-id-type="pmid">20591905</pub-id></element-citation></ref>
<ref id="b14-ol-0-0-6835"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname><given-names>S</given-names></name><name><surname>Chang</surname><given-names>HH</given-names></name><name><surname>Wang</surname><given-names>C</given-names></name></person-group><article-title>Weighted-SAMGSR: Combining significance analysis of microarray-gene set reduction algorithm with pathway topology-based weights to select relevant genes</article-title><source>Biol Direct</source><volume>11</volume><fpage>50</fpage><year>2016</year><pub-id pub-id-type="doi">10.1186/s13062-016-0152-3</pub-id><pub-id pub-id-type="pmid">27681389</pub-id></element-citation></ref>
<ref id="b15-ol-0-0-6835"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>L</given-names></name><name><surname>Xuan</surname><given-names>J</given-names></name><name><surname>Riggins</surname><given-names>RB</given-names></name><name><surname>Clarke</surname><given-names>R</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name></person-group><article-title>Identifying cancer biomarkers by network-constrained support vector machines</article-title><source>BMC Syst Biol</source><volume>5</volume><fpage>161</fpage><year>2011</year><pub-id pub-id-type="doi">10.1186/1752-0509-5-161</pub-id><pub-id pub-id-type="pmid">21992556</pub-id></element-citation></ref>
<ref id="b16-ol-0-0-6835"><label>16</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname><given-names>H</given-names></name><name><surname>Lin</surname><given-names>W</given-names></name><name><surname>Feng</surname><given-names>R</given-names></name><name><surname>Li</surname><given-names>H</given-names></name></person-group><article-title>Network-regularized high-dimensional Cox regression for analysis of genomic data</article-title><source>Stat Sin</source><volume>24</volume><fpage>1433</fpage><lpage>1459</lpage><year>2014</year><pub-id pub-id-type="pmid">26316678</pub-id></element-citation></ref>
<ref id="b17-ol-0-0-6835"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname><given-names>W</given-names></name><name><surname>Xie</surname><given-names>B</given-names></name><name><surname>Shen</surname><given-names>X</given-names></name></person-group><article-title>Incorporating predictor network in penalized regression with application to microarray data</article-title><source>Biometrics</source><volume>66</volume><fpage>474</fpage><lpage>484</lpage><year>2010</year><pub-id pub-id-type="doi">10.1111/j.1541-0420.2009.01296.x</pub-id><pub-id pub-id-type="pmid">19645699</pub-id></element-citation></ref>
<ref id="b18-ol-0-0-6835"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sokolov</surname><given-names>A</given-names></name><name><surname>Carlin</surname><given-names>DE</given-names></name><name><surname>Paull</surname><given-names>EO</given-names></name><name><surname>Baertsch</surname><given-names>R</given-names></name><name><surname>Stuart</surname><given-names>JM</given-names></name></person-group><article-title>Pathway-based genomics prediction using generalized elastic net</article-title><source>PLoS Comput Biol</source><volume>12</volume><fpage>e1004790</fpage><year>2016</year><pub-id pub-id-type="doi">10.1371/journal.pcbi.1004790</pub-id><pub-id pub-id-type="pmid">26960204</pub-id></element-citation></ref>
<ref id="b19-ol-0-0-6835"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hoffman</surname><given-names>P</given-names></name><name><surname>Grinstein</surname><given-names>G</given-names></name><name><surname>Marx</surname><given-names>K</given-names></name><name><surname>Grosse</surname><given-names>I</given-names></name><name><surname>Stanley</surname><given-names>E</given-names></name></person-group><article-title>DNA visual and analytic data mining</article-title><source>Proceedings Vis&#x0027; 97</source><comment>(Cat No 97CB36155)</comment><year>1997</year></element-citation></ref>
<ref id="b20-ol-0-0-6835"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mramor</surname><given-names>M</given-names></name><name><surname>Leban</surname><given-names>G</given-names></name><name><surname>Demsar</surname><given-names>J</given-names></name><name><surname>Zupan</surname><given-names>B</given-names></name></person-group><article-title>Visualization-based cancer microarray data classification analysis</article-title><source>Bioinformatics</source><volume>23</volume><fpage>2147</fpage><lpage>2154</lpage><year>2007</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btm312</pub-id><pub-id pub-id-type="pmid">17586552</pub-id></element-citation></ref>
<ref id="b21-ol-0-0-6835"><label>21</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Morrison</surname><given-names>JL</given-names></name><name><surname>Breitling</surname><given-names>R</given-names></name><name><surname>Higham</surname><given-names>DJ</given-names></name><name><surname>Gilbert</surname><given-names>DR</given-names></name></person-group><article-title>GeneRank: Using search engine technology for the analysis of microarray experiments</article-title><source>BMC Bioinformatics</source><volume>6</volume><fpage>233</fpage><year>2005</year><pub-id pub-id-type="doi">10.1186/1471-2105-6-233</pub-id><pub-id pub-id-type="pmid">16176585</pub-id></element-citation></ref>
<ref id="b22-ol-0-0-6835"><label>22</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McCall</surname><given-names>MN</given-names></name><name><surname>Bolstad</surname><given-names>BM</given-names></name><name><surname>Irizarry</surname><given-names>RA</given-names></name></person-group><article-title>Frozen robust multiarray analysis (fRMA)</article-title><source>Biostatistics</source><volume>11</volume><fpage>242</fpage><lpage>253</lpage><year>2010</year><pub-id pub-id-type="doi">10.1093/biostatistics/kxp059</pub-id><pub-id pub-id-type="pmid">20097884</pub-id></element-citation></ref>
<ref id="b23-ol-0-0-6835"><label>23</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Law</surname><given-names>CW</given-names></name><name><surname>Chen</surname><given-names>Y</given-names></name><name><surname>Shi</surname><given-names>W</given-names></name><name><surname>Smyth</surname><given-names>GK</given-names></name></person-group><article-title>Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts</article-title><source>Genome Biol</source><volume>15</volume><fpage>R29</fpage><year>2014</year><pub-id pub-id-type="doi">10.1186/gb-2014-15-2-r29</pub-id><pub-id pub-id-type="pmid">24485249</pub-id></element-citation></ref>
<ref id="b24-ol-0-0-6835"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Leban</surname><given-names>G</given-names></name><name><surname>Bratko</surname><given-names>I</given-names></name><name><surname>Petrovic</surname><given-names>U</given-names></name><name><surname>Curk</surname><given-names>T</given-names></name><name><surname>Zupan</surname><given-names>B</given-names></name></person-group><article-title>VizRank: Finding informative data projections in functional genomics by machine learning</article-title><source>Bioinformatics</source><volume>21</volume><fpage>413</fpage><lpage>414</lpage><year>2005</year><pub-id pub-id-type="doi">10.1093/bioinformatics/bti016</pub-id><pub-id pub-id-type="pmid">15358614</pub-id></element-citation></ref>
<ref id="b25-ol-0-0-6835"><label>25</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yeung</surname><given-names>KY</given-names></name><name><surname>Bumgarner</surname><given-names>RE</given-names></name><name><surname>Raftery</surname><given-names>AE</given-names></name></person-group><article-title>Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data</article-title><source>Bioinformatics</source><volume>21</volume><fpage>2394</fpage><lpage>2402</lpage><year>2005</year><pub-id pub-id-type="doi">10.1093/bioinformatics/bti319</pub-id><pub-id pub-id-type="pmid">15713736</pub-id></element-citation></ref>
<ref id="b26-ol-0-0-6835"><label>26</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Uno</surname><given-names>H</given-names></name><name><surname>Cai</surname><given-names>T</given-names></name><name><surname>Pencina</surname><given-names>MJ</given-names></name><name><surname>D&#x0027;Agostino</surname><given-names>RB</given-names></name><name><surname>Wei</surname><given-names>LJ</given-names></name></person-group><article-title>On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data</article-title><source>Stat Med</source><volume>30</volume><fpage>1105</fpage><lpage>1117</lpage><year>2011</year><pub-id pub-id-type="pmid">21484848</pub-id></element-citation></ref>
<ref id="b27-ol-0-0-6835"><label>27</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname><given-names>SD</given-names></name><name><surname>Parmigiani</surname><given-names>G</given-names></name><name><surname>Huttenhower</surname><given-names>C</given-names></name><name><surname>Waldron</surname><given-names>L</given-names></name></person-group><article-title>M&#x00E1;s-o-menos: A simple sign averaging method for discrimination in genomic data analysis</article-title><source>Bioinformatics</source><volume>30</volume><fpage>3062</fpage><lpage>3069</lpage><year>2014</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btu488</pub-id><pub-id pub-id-type="pmid">25061068</pub-id></element-citation></ref>
<ref id="b28-ol-0-0-6835"><label>28</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Binder</surname><given-names>H</given-names></name><name><surname>Schumacher</surname><given-names>M</given-names></name></person-group><article-title>Comment on &#x2018;network-constrained regularization and variable selection for analysis of genomic data&#x2019;</article-title><source>Bioinformatics</source><volume>24</volume><fpage>2566</fpage><lpage>2569</lpage><year>2008</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btn412</pub-id><pub-id pub-id-type="pmid">18682424</pub-id></element-citation></ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-ol-0-0-6835" position="float">
<label>Figure 1.</label>
<caption><p>Study schema.</p></caption>
<graphic xlink:href="ol-14-05-5464-g03.tif"/>
</fig>
<fig id="f2-ol-0-0-6835" position="float">
<label>Figure 2.</label>
<caption><p>Graphical illustrations of the best RadViz projections under four scenarios (the first 200/500/1,000 genes, or all genes) for the AC/SCC segmentation. The genes were ordered decreasingly based on their GeneRanks, the first 200/500/1,000-gene scenarios include the highest ranked 200/500/1,000 genes, respectively. The table below those projections gives the resulting gene lists and the predictive statistics using 5-fold cross-validations.</p></caption>
<graphic xlink:href="ol-14-05-5464-g04.tif"/>
</fig>
<fig id="f3-ol-0-0-6835" position="float">
<label>Figure 3.</label>
<caption><p>Graphical illustrations of the best RadViz projections under four scenarios (the first 200/500/1,000 genes, or all genes) for the high risk and the low risk segmentation. The table below those projections gives the predictive statistics of the resulting gene signatures using 5-fold cross-validations.</p></caption>
<graphic xlink:href="ol-14-05-5464-g05.tif"/>
</fig>
<fig id="f4-ol-0-0-6835" position="float">
<label>Figure 4.</label>
<caption><p>Heatmaps of the resulting 8-gene diagnostic and prognostic signatures under the first 1,000-gene scenario: (A) For the diagnostic signature. According to the hierarchical clustering, AC and SCC can be separated using the 8 diagnostic genes. (B) For the prognostic signature. According to the hierarchical clustering, these samples could be stratified into three clusters-patients with high risk of death, patients with low risk, and patients with ambiguous labels.</p></caption>
<graphic xlink:href="ol-14-05-5464-g06.tif"/>
</fig>
<fig id="f5-ol-0-0-6835" position="float">
<label>Figure 5.</label>
<caption><p>Venn-diagram of the 3-gene diagnostic signatures under four scenarios. The venn-diagram illustrates that the stability of those 3-gene diagnostic signatures are also good. The numbers in blankets are the ranks of corresponding genes given by the GeneRank method.</p></caption>
<graphic xlink:href="ol-14-05-5464-g07.tif"/>
</fig>
<table-wrap id="tI-ol-0-0-6835" position="float">
<label>Table I.</label>
<caption><p>Performance statistics for the AC/SCC subtype segmentation.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom" colspan="5">A, The maximum size of each projection is fixed at 8.</th>
</tr>
<tr>
<th align="left" valign="bottom" colspan="5"><hr/></th>
</tr>
<tr>
<th/>
<th align="center" valign="bottom" colspan="2">Training set (GSE50081)</th>
<th align="center" valign="bottom" colspan="2">Test set (RNA-Seq)</th>
</tr>
<tr>
<th/>
<th align="center" valign="bottom" colspan="2"><hr/></th>
<th align="center" valign="bottom" colspan="2"><hr/></th>
</tr>
<tr>
<th align="left" valign="bottom">Variable</th>
<th align="center" valign="bottom">Accuracy (&#x0025;)</th>
<th align="center" valign="bottom">GBS</th>
<th align="center" valign="bottom">Accuracy (&#x0025;)</th>
<th align="center" valign="bottom">GBS</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(200)</sub></td>
<td align="center" valign="top">90.98</td>
<td align="center" valign="top">0.092</td>
<td align="center" valign="top">79.4</td>
<td align="center" valign="top">0.188</td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(500)</sub></td>
<td align="center" valign="top">90.98</td>
<td align="center" valign="top">0.092</td>
<td align="center" valign="top">77.6</td>
<td align="center" valign="top">0.186</td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(1,000)</sub></td>
<td align="center" valign="top">92.48</td>
<td align="center" valign="top">0.088</td>
<td align="center" valign="top">76</td>
<td align="center" valign="top">0.180</td>
</tr>
<tr>
<td align="left" valign="top">All 1,952 genes</td>
<td align="center" valign="top">91.73</td>
<td align="center" valign="top">0.082</td>
<td align="center" valign="top">78.4</td>
<td align="center" valign="top">0.165</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><hr/></td>
</tr>
<tr>
<td align="left" valign="top" colspan="5">B, The maximum size of each projection is fixed at 3.</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><hr/></td>
</tr>
<tr>
<td/>
<td align="center" valign="top" colspan="2">Training set (GSE50081)</td>
<td align="center" valign="top" colspan="2">Test set (RNA-Seq)</td>
</tr>
<tr>
<td/>
<td align="center" valign="top" colspan="2"><hr/></td>
<td align="center" valign="top" colspan="2"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Variable</td>
<td align="center" valign="top">Accuracy (&#x0025;)</td>
<td align="center" valign="top">GBS</td>
<td align="center" valign="top">Accuracy (&#x0025;)</td>
<td align="center" valign="top">GBS</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(200)</sub></td>
<td align="center" valign="top">89.47</td>
<td align="center" valign="top">0.087</td>
<td align="center" valign="top">76</td>
<td align="center" valign="top">0.202</td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(500)</sub></td>
<td align="center" valign="top">90.23</td>
<td align="center" valign="top">0.109</td>
<td align="center" valign="top">82.4</td>
<td align="center" valign="top">0.173</td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(1,000)</sub></td>
<td align="center" valign="top">90.23</td>
<td align="center" valign="top">0.117</td>
<td align="center" valign="top">84</td>
<td align="center" valign="top">0.164</td>
</tr>
<tr>
<td align="left" valign="top">All 1,952 genes</td>
<td align="center" valign="top">90.23</td>
<td align="center" valign="top">0.107</td>
<td align="center" valign="top">82.4</td>
<td align="center" valign="top">0.171</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tII-ol-0-0-6835" position="float">
<label>Table II.</label>
<caption><p>Performance statistics for the NSCLC high risk/low risk segmentation.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom" colspan="5">A, The maximum size of each projection is fixed at 8</th>
</tr>
<tr>
<th align="left" valign="bottom" colspan="5"><hr/></th>
</tr>
<tr>
<th/>
<th align="center" valign="bottom" colspan="2">Training set (GSE50081)</th>
<th align="center" valign="bottom" colspan="2">Test set (RNA-Seq)</th>
</tr>
<tr>
<th/>
<th align="center" valign="bottom" colspan="2"><hr/></th>
<th align="center" valign="bottom" colspan="2"><hr/></th>
</tr>
<tr>
<th align="left" valign="bottom">Variable</th>
<th align="center" valign="bottom">C-stat</th>
<th align="center" valign="bottom">P-value (log rank)</th>
<th align="center" valign="bottom">C-stat</th>
<th align="center" valign="bottom">P-value</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(200)</sub></td>
<td align="center" valign="top">0.6276</td>
<td align="center" valign="top">6.47&#x00D7;10<sup>&#x2212;3</sup></td>
<td align="center" valign="top">0.4174</td>
<td align="center" valign="top">0.051</td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(500)</sub></td>
<td align="center" valign="top">0.5783</td>
<td align="center" valign="top">1.70&#x00D7;10<sup>&#x2212;3</sup></td>
<td align="center" valign="top">0.5097</td>
<td align="center" valign="top">0.59</td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(1,000)</sub></td>
<td align="center" valign="top">0.6687</td>
<td align="center" valign="top">8.11&#x00D7;10<sup>&#x2212;6</sup></td>
<td align="center" valign="top">0.4207</td>
<td align="center" valign="top">0.131</td>
</tr>
<tr>
<td align="left" valign="top">All genes</td>
<td align="center" valign="top">0.6045</td>
<td align="center" valign="top">4.13&#x00D7;10<sup>&#x2212;5</sup></td>
<td align="center" valign="top">0.2284</td>
<td align="center" valign="top">0.799</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><hr/></td>
</tr>
<tr>
<td align="left" valign="top" colspan="5">B, 40 genes with the highest frequencies in RadViz projections</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><hr/></td>
</tr>
<tr>
<td/>
<td align="center" valign="top" colspan="2">Training set (GSE50081)</td>
<td align="center" valign="top" colspan="2">Test set (RNA-Seq)</td>
</tr>
<tr>
<td/>
<td align="center" valign="top" colspan="2"><hr/></td>
<td align="center" valign="top" colspan="2"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Variable</td>
<td align="center" valign="top">C-stat</td>
<td align="center" valign="top">P-value (log rank)</td>
<td align="center" valign="top">C-stat</td>
<td align="center" valign="top">P-value</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(200)</sub></td>
<td align="center" valign="top">0.7035</td>
<td align="center" valign="top">2.82&#x00D7;10<sup>&#x2212;4</sup></td>
<td align="center" valign="top">0.4693</td>
<td align="center" valign="top">0.161</td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(500)</sub></td>
<td align="center" valign="top">0.6965</td>
<td align="center" valign="top">4.57&#x00D7;10<sup>&#x2212;6</sup></td>
<td align="center" valign="top">0.5374</td>
<td align="center" valign="top">0.089</td>
</tr>
<tr>
<td align="left" valign="top">G<sub>(1)</sub> ~G<sub>(1,000)</sub></td>
<td align="center" valign="top">0.7244</td>
<td align="center" valign="top">5.23&#x00D7;10<sup>&#x2212;7</sup></td>
<td align="center" valign="top">0.5436</td>
<td align="center" valign="top">0.054</td>
</tr>
<tr>
<td align="left" valign="top">All genes</td>
<td align="center" valign="top">0.7018</td>
<td align="center" valign="top">3.34&#x00D7;10<sup>&#x2212;6</sup></td>
<td align="center" valign="top">0.4381</td>
<td align="center" valign="top">0.112</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</article>
