<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xml:lang="en" article-type="research-article" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">OR</journal-id>
<journal-title-group>
<journal-title>Oncology Reports</journal-title></journal-title-group>
<issn pub-type="ppub">1021-335X</issn>
<issn pub-type="epub">1791-2431</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3892/or.2012.1891</article-id>
<article-id pub-id-type="publisher-id">or-28-03-1036</article-id>
<article-categories>
<subj-group>
<subject>Articles</subject></subj-group></article-categories>
<title-group>
<article-title>Identification of candidate colon cancer biomarkers by applying a random forest approach on microarray data</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>YAN</surname><given-names>ZHI</given-names></name><xref rid="af1-or-28-03-1036" ref-type="aff">1</xref><xref rid="fn1-or-28-03-1036" ref-type="author-notes">&#x0002A;</xref></contrib>
<contrib contrib-type="author">
<name><surname>LI</surname><given-names>JIANGENG</given-names></name><xref rid="af2-or-28-03-1036" ref-type="aff">2</xref><xref rid="fn1-or-28-03-1036" ref-type="author-notes">&#x0002A;</xref></contrib>
<contrib contrib-type="author">
<name><surname>XIONG</surname><given-names>YIMIN</given-names></name><xref rid="af1-or-28-03-1036" ref-type="aff">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>XU</surname><given-names>WEITIAN</given-names></name><xref rid="af1-or-28-03-1036" ref-type="aff">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>ZHENG</surname><given-names>GUORONG</given-names></name><xref rid="af1-or-28-03-1036" ref-type="aff">1</xref><xref ref-type="corresp" rid="c1-or-28-03-1036"/></contrib></contrib-group>
<aff id="af1-or-28-03-1036">
<label>1</label>Department of Digestive Diseases, Wuhan General Hospital of Guangzhou Command, Wuhan, P.R. China</aff>
<aff id="af2-or-28-03-1036">
<label>2</label>Academy of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, P.R. China</aff>
<author-notes>
<corresp id="c1-or-28-03-1036"><italic>Correspondence to:</italic> Dr Guorong Zheng, Department of Digestive Diseases, Wuhan General Hospital of Guangzhou Command, Wuluo Road 627, Wuchang District, Wuhan 430070, P.R. China, E-mail: <email>guorongzheng@sina.com</email></corresp><fn id="fn1-or-28-03-1036">
<label>&#x0002A;</label>
<p>Contributed equally</p></fn></author-notes>
<pub-date pub-type="ppub">
<month>9</month>
<year>2012</year></pub-date>
<pub-date pub-type="epub">
<day>29</day>
<month>06</month>
<year>2012</year></pub-date>
<volume>28</volume>
<issue>3</issue>
<fpage>1036</fpage>
<lpage>1042</lpage>
<history>
<date date-type="received">
<day>13</day>
<month>03</month>
<year>2012</year></date>
<date date-type="accepted">
<day>08</day>
<month>06</month>
<year>2012</year></date></history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2012, Spandidos Publications</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0">
<license-p>This is an open-access article licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. The article may be redistributed, reproduced, and reused for non-commercial purposes, provided the original source is properly cited.</license-p></license></permissions>
<abstract>
<p>Colon cancer is the third most common cancer and one of the leading causes of cancer-related death in the world. Therefore, identification of biomarkers with potential in recognizing the biological characteristics is a key problem for early diagnosis of colon cancer patients. In this study, we used a random forest approach to discover biomarkers based on a set of oligonucleotide microarray data of colon cancer. Real-time PCR was used to validate the related expression levels of biomarkers selected by our approach. Furthermore, ROC curves were used to analyze the sensitivity and specificity of each biomarker in both training and test sample sets. Finally, we analyzed the clinical significance of each biomarker based on their differential expression. A single classifier consisting of 4 genes (IL8, WDR77, MYL9 and VIP) was selected by random forests with an average sensitivity and specificity of 83.75 and 76.15&#x00025;. The differential expression levels of each biomarker was validated by real-time PCR in 48 test colon cancer samples compared to the matched normal tissues. Patients with high expression of IL8 and WDR77, and low expression of MYL9 and VIP had a significantly reduced median survival rate compared to colon cancer patients. The results indicate that our approach can be employed for biomarker identification based on microarray data. These 4 genes identified by our approach have the potential to act as clinical biomarkers for the early diagnosis of colon cancer.</p></abstract>
<kwd-group>
<kwd>microarray</kwd>
<kwd>random forests</kwd>
<kwd>biomarker</kwd>
<kwd>colon cancer</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Colon cancer is the third most common cancer, and one of the leading causes of morbidity and mortality in the world (<xref rid="b1-or-28-03-1036" ref-type="bibr">1</xref>). According to the United States&#x02019; statistics released in 2010 the incidence rate of colon has decreased (<xref rid="b2-or-28-03-1036" ref-type="bibr">2</xref>). Over the last decade, many studies have proposed various kinds of statistical methods to analyze gene expression patterns and identify new biomarkers for prognostic and/or predictive information in relation to human diseases (<xref rid="b3-or-28-03-1036" ref-type="bibr">3</xref>,<xref rid="b4-or-28-03-1036" ref-type="bibr">4</xref>). However, most of the early studies applied unsupervised approaches to data-mining and identification of differential gene expressed profiling of certain diseases, such as hierarchical clustering for class discovering, taking an unbiased approach to searching for subgroups in the data (<xref rid="b5-or-28-03-1036" ref-type="bibr">5</xref>). Along with the statistical methods extensively penetrated into the field of biomedicine, many supervised clustering analysis and machine learning approaches were adopted to deal with gene expression profiling data and sieved feature genes which contained more information to classify different kinds of diseases or subclasses of the same disease.</p>
<p>Various methods of statistics and machine learning, including clustering (<xref rid="b6-or-28-03-1036" ref-type="bibr">6</xref>,<xref rid="b7-or-28-03-1036" ref-type="bibr">7</xref>), Bayesian algorithms (<xref rid="b8-or-28-03-1036" ref-type="bibr">8</xref>), and support vector machines (<xref rid="b9-or-28-03-1036" ref-type="bibr">9</xref>), have been proposed to analyze microarray data generated through high-throughput experiments. Over the last few years, the technology of multiclassifier fusion developed substantially, and became very successful in improving the accuracy of certain classifiers. Random forests (RF) (<xref rid="b10-or-28-03-1036" ref-type="bibr">10</xref>,<xref rid="b11-or-28-03-1036" ref-type="bibr">11</xref>), a tree-based method of classification and regression, is one of the most important methods of multiclassifier fusion. Besides the outcome of classification, RF also returns several measures of variable importance according to which feature genes can be selected. Since RF is comparable with other methods and even better to a certain extent (<xref rid="b12-or-28-03-1036" ref-type="bibr">12</xref>), it is used broadly especially for microarray data (<xref rid="b13-or-28-03-1036" ref-type="bibr">13</xref>). Additionally, RF can be used as not only a supervised algorithm but also an unsupervised one (<xref rid="b14-or-28-03-1036" ref-type="bibr">14</xref>), which depends on whether the gene expression data come from known classes or not.</p>
<p>In this study, we adopted an RF-based method for feature gene selection incorporating deductive reasoning to process the differential gene expressed profiling of colon cancer. We thus, selected 4 feature genes (IL8, WDR77, MYL9 and VIP) for colon cancer classification. Then, the differential expression level of each biomarker was validated by real-time PCR and in 48 test colon cancer samples compared to their matched normal tissues with high sensitivity and specificity. The results showed that our approach could filter out genes of great importance based on microarray data, and the genes selected by our approach were validated with high accuracy in classifying colon cancer and matched normal samples.</p></sec>
<sec sec-type="methods">
<title>Materials and methods</title>
<sec>
<title>Micoarray data set</title>
<p>In 1999, Alon, <italic>et al</italic>(<xref rid="b15-or-28-03-1036" ref-type="bibr">15</xref>) detected the whole genome of 40 colon tumor and 22 normal samples using an Affymetrix oligonuleotide array (Hum6000) and a two-way clustering approach to classify genes into functional groups. The microarray data was downloaded at: <ext-link xlink:href="http://genomics-pubs.princeton.edu/oncology/affydata/index.html" ext-link-type="uri">http://genomics-pubs.princeton.edu/oncology/affydata/index.html</ext-link>. To further study this group of microarray data and rediscover potential biomarkers not been mined completely, we used an RF-based machine learning method in our investigation.</p></sec>
<sec>
<title>RF algorithm</title>
<p>One of the most important supervised methods RF was used for data-mining in this study. The reliable measure is based on the decrease of classification accuracy when values of a variable in a node of a tree undergo random permutations (<xref rid="b16-or-28-03-1036" ref-type="bibr">16</xref>). All training set observations were assigned to different terminal nodes in a tree and distinct split values were determined through several criterions such as the Gini index. The class of majority of training set observation in the terminal node was selected as the class of the node. We selected fewer genes with which the classifiers produced smallest out-of-bag (OOB) errors and highest classification scores.</p>
<p>For sample j, we defined mr<sub>j</sub> as the difference between its accuracy rate and misclassifying rate. Additionally, we defined the mean decrease of accuracy rate of gene g as MDA (g). The calculating formulas of mr<sub>j</sub> and MDA (g) are represented as follows:</p>
<disp-formula id="fd1">
<label>Eq. 1</label>
<mml:math id="m1" display='block'>
<mml:semantics id="sm1">
<mml:mtable columnalign='left' columnspacing="2pt">
<mml:mtr>
<mml:mtd>
<mml:mi>m</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo>&#x003D;</mml:mo>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mrow></mml:mfrac>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x002D;</mml:mo>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x002D;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mrow></mml:mfrac>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mrow></mml:mfrac>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x002D;</mml:mo>
<mml:mn>1</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:semantics></mml:math></disp-formula>
<disp-formula id="fd2">
<label>Eq. 2</label>
<mml:math id="m2" display='block'>
<mml:semantics id="sm2">
<mml:mtable columnalign='left' columnspacing="2pt">
<mml:mtr>
<mml:mtd>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>A</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi></mml:mfrac>
<mml:munderover>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>N</mml:mi></mml:munderover>
<mml:mrow>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>m</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo>&#x002D;</mml:mo>
<mml:mi>m</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x003D;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi></mml:mfrac>
<mml:munderover>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>N</mml:mi></mml:munderover>
<mml:mrow>
<mml:mfrac>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi></mml:mrow></mml:msubsup>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>&#x000D7;</mml:mo>
<mml:munderover>
<mml:mo>&#x02211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>e</mml:mi></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>&#x002D;</mml:mo>
<mml:mi>B</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:semantics></mml:math></disp-formula>
<disp-formula id="fd3">
<label>Eq. 3</label>
<mml:math id="m3" display='block'>
<mml:semantics id="sm3">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="fd4">
<label>Eq. 4</label>
<mml:math id="m4" display='block'>
<mml:semantics id="sm4">
<mml:mrow>
<mml:mi>B</mml:mi>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>I</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow>
<mml:mo>&#x02032;</mml:mo></mml:msubsup>
<mml:mo stretchy='false'>&#x0028;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo stretchy='false'>&#x0029;</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>I(g) denotes indicator function; ntree is the number of tree classifiers; N, total samples; OOB<sub>j</sub>(i) &#x0003D; T, represents that sample j exists in OOB data set for tree i. If j is correctly classified by i, V<sub>j</sub>(i) &#x0003D; Tclass. Similarly, j is correctly classified after the value of gene g is randomly permuted V<sub>j(g)</sub>(i) &#x0003D; Tclass.</p></sec>
<sec>
<title>RNA isolation and real-time-PCR</title>
<p>A total of 48 colon cancer and matched normal tissues from Wuhan General Hospital of Guangzhou Command were used in this study for real-time-PCR experiment. Total RNA was extracted from the tissue samples according to a standard TRIzol protocol (Invitrogen, Carlsbad, CA, USA). Total RNA (5 &#x003BC;g) was reverse transcribed to cDNA with 200 U M-MLV reverse transcriptase (Promega, Madison) according to a standard manufacturer&#x02019;s protocol. The reverse transcription reaction conditions were: 37&#x000B0;C for 60 min, 72&#x000B0;C for 10 min. Real-time-PCR was performed in a total 20 &#x003BC;l reaction mixture with 2 &#x003BC;l of cDNA, 0.6 &#x003BC;l 20X EvaGreen (CapitalBio, Beijing, China), and 0.5 &#x003BC;l of each 10 &#x003BC;M forward and reverse primers, 0.5 &#x003BC;l of 2.5 mM dNTP, 1.5 U Cap Taq polymerase (CapitalBio), 10 &#x003BC;l of 2X PCR buffer for EvaGreen and 6.1 &#x003BC;l of ddH<sub>2</sub>O. Quantification of differentially expressed genes was conducted with an RT-Cycle&#x02122; 2.0 system (CapitalBio). Real-time-PCR was carried out with programmed parameters, heating at 9&#x000B0;C for 5 min followed by 40 cycles of a 3-stage temperature profile of 95&#x000B0;C for 30 sec, 57&#x000B0;C for 30 sec, 72&#x000B0;C for 30 sec. All reactions were designed with 3 duplications and the final Ct values were determined by the average Ct value of the duplicated reaction. The melting curves for each PCR reaction were carefully analyzed to avoid non-specific amplifications in PCR products. The expression of each gene was transformed using the 2<sup>-&#x00394;&#x00394;Ct</sup> formula and normalized with the &#x003B2;-actin expression (<xref rid="b17-or-28-03-1036" ref-type="bibr">17</xref>).</p></sec>
<sec>
<title>Receiver operating curve (ROC) and statistical analysis</title>
<p>ROC curve analysis was conducted using the MedCalc software packages (version 8.2.1.0; Mariakerke, Belgium). The AUC curves provided a measure of the overall performance of a diagnostic test. The ratio of gene signal intensities and the Ct value of each gene were used for ROC calculation in training and test samples, respectively. The clinical data were analyzed using the Chi-square test. The cumulative survival curve was compared by the log-rank test. For all analyses, a difference with P&lt;0.05 was considered statistically significant.</p></sec></sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title>Biomarker rediscovery by the RF approach</title>
<p>We processed the microarray data of colon cancer using an RF-based algorithm. According to the OOB error rate, we identified 4 genes as a classifier to classify colon cancer and normal samples, composed of two upregulated genes IL8 and WDR77, and two downregulated genes MYL9 and VIP (<xref rid="tI-or-28-03-1036" ref-type="table">Tables I</xref> and <xref rid="tII-or-28-03-1036" ref-type="table">II</xref>). The classification accuracy of the 4-gene classifier was 91.94&#x00025; (<xref rid="f1-or-28-03-1036" ref-type="fig">Fig. 1A</xref>). The average expression levels of each gene and the clustering graphical overview are shown in <xref rid="f1-or-28-03-1036" ref-type="fig">Fig. 1B and C</xref>.</p></sec>
<sec>
<title>Real-time PCR and IHC staining validation</title>
<p>cDNA from 48 colon cancer and matched normal tissues were used for real-time PCR experiment. The results showed that IL8 was upregulated in 37 of 48 cancer samples (77.1&#x00025;) compared to the matched normal tissues with P-value of 0.032. Similarly, WDR77 was upregulated in 34 colon cancer samples (70.8&#x00025;) with a P-value of 0.046. On the contrary, MYL9 was downregulated in 35 of 48 cancer samples (72.9&#x00025;) with P-value of 0.028 and VIP was downregulated in 33 colon cancer samples (68.8&#x00025;) with a P-value of 0.177 (<xref rid="f2-or-28-03-1036" ref-type="fig">Fig. 2</xref>).</p></sec>
<sec>
<title>ROC curve analysis</title>
<p>In order to analyze the classification sensitivity and specificity of the candidate biomarkers, we used ROC analysis both in training and test sample data. We observed a high sensitivity and specificity of the biomarkers and consistent results from both training and test samples. AUC-values of IL8, WDR77, MYL9 and VIP were 0.853, 0.875, 0.826 and 0.812 in the training group (<xref rid="f3-or-28-03-1036" ref-type="fig">Fig. 3A</xref>, <xref rid="tIII-or-28-03-1036" ref-type="table">Table III</xref>); 0.869, 0.867, 0.898 and 0.845 in the test group, respectively (<xref rid="f3-or-28-03-1036" ref-type="fig">Fig. 3B</xref>, <xref rid="tIII-or-28-03-1036" ref-type="table">Table III</xref>).</p></sec>
<sec>
<title>Clinical significance of the biomarkers</title>
<p>The expression levels of IL8, WDR77, MYL9 and VIP were used for comparing some of the clinical indicators in 48 colon cancer patients. A significant difference was observed in two groups which represent positive expression and negative expression of IL8 as follows: IL8(&#x0002B;) and IL8(-). Patients with IL8(&#x0002B;) had significantly reduced median survival compared to those with IL8(-) (P&lt;0.001). Meanwhile, we observed that the positive expression of IL8 was associated with gender (P&#x0003D;0.029), clinical stage (P&lt;0.001) and survival status (P&lt;0.001) of colon cancer patients (<xref rid="tIV-or-28-03-1036" ref-type="table">Table IV</xref>). The expression levels of WDR77 were associated with the clinical stage (P&#x0003D;0.008), numbers of the embolus (P&#x0003D;0.035) and the survival time of the patients. On the contrary, negative expression of MYL9 and VIP were associated with median survival time of colon cancer patients (<xref rid="tIV-or-28-03-1036" ref-type="table">Table IV</xref>). In addition, negative expression of VIP was associated with the differentiation status of cancer cell (P&#x0003D;0.026) and recurrence risk (P&#x0003D;0.019) of colon cancer patients (<xref rid="tIV-or-28-03-1036" ref-type="table">Table IV</xref>). The details of clinical significance for all the candidate biomarkers are shown in <xref rid="tIV-or-28-03-1036" ref-type="table">Table IV</xref>.</p></sec></sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>Colon cancer is one of the most common diseases in the world, but only few tumor-specific gene products have been identified that could serve as targets to aid in the diagnosis of colon cancer. Its high prevalence and bad prognosis encourage researchers to find new biomarkers for the diagnosis and treatment of colon cancer. The microarray technique provides an effective method to identify a large scale of candidate biomarkers. Gene expression, methylation and microRNA profiling of colon cancer have been performed (<xref rid="b18-or-28-03-1036" ref-type="bibr">18</xref>&#x02013;<xref rid="b20-or-28-03-1036" ref-type="bibr">20</xref>).</p>
<p>High-throughput microarray technologies have generated a large amount of data, where, various statistical and machine learning methods were adopted to analyze the data for finding gene or protein expression patterns and search for new biomarkers of human diseases. Microarray data analysis involves selecting the biomarkers which contain useful information necessary for molecular classification of human diseases and for establishing a gene expression profile. In this study, we present a concise investigative mode for feature gene selecting. We used a supervised machine learning algorithm RF to select gene a classifier based on differential gene expression profiling. A series of biological experiments were used to validate the results from high-throughput data.</p>
<p>RF is an effective algorithm with classifying quality comparable to other methods such as support vector machines (SVM) (<xref rid="b10-or-28-03-1036" ref-type="bibr">10</xref>). It can also select featured genes which embody differentially expressed levels among different samples. We applied RF to deal with a colon cancer dataset and identified 4 genes which had great biological significance. The classifier composed of the 4 genes produced a high accuracy on both the training and the test samples. Bootstrap aggregating, a resample technique, is used when building the RF. This technique allows RF not to prune like other tree-based classification algorithm. Furthermore, RF can avoid over-fitting effectively although the mechanism is not currently clear. Besides dealing with the gene expression microarray data, RF has been extensively used in other aspects of biomedicine territory. In the latest years, RF was adopted extensively to analyze the single-nucleotide polymorphisms data (<xref rid="b21-or-28-03-1036" ref-type="bibr">21</xref>) and the gene pathway building investigation (<xref rid="b22-or-28-03-1036" ref-type="bibr">22</xref>).</p>
<p>In order to identify biomarkers with high sensitivity and specificity, verification in the laboratory and detection of new test clinical samples are important. Real-time PCR and tissuemicroarray-based IHC staining provided us convenient and precise approaches to detect the expression levels of candidate biomarkers. Our results also showed that real-time PCR was sensitive and specific for gene expression level validation. The PCR-based detection method therefore, appears to provide us with an easy way in early clinical diagnosis of human cancer.</p>
<p>The function and clinical significance of IL8 and VIP have been reported. There are 1,182 studies describing the gene function of IL8, including the biological mechanism in progress of most kinds of human cancer such as: glioblastoma, gastric carcinoma, small cell lung cancer, prostate cancer, esophageal squamous cell carcinoma, acute myelogenous leukemia, and colon cancer (<xref rid="b23-or-28-03-1036" ref-type="bibr">23</xref>&#x02013;<xref rid="b29-or-28-03-1036" ref-type="bibr">29</xref>). It was confirmed that IL8 is differentially expressed in colon cancer, and is associated with proliferation, migration, angiogenesis and chemosensitivity in colon cancer cell line models (<xref rid="b30-or-28-03-1036" ref-type="bibr">30</xref>). The VIP gene has also been the focus of investigation in many studies ralating to human cancer (<xref rid="b31-or-28-03-1036" ref-type="bibr">31</xref>&#x02013;<xref rid="b36-or-28-03-1036" ref-type="bibr">36</xref>). WDR77, also known as p44, was reported to be related to the differentiation and proliferation in prostate epithelium (<xref rid="b37-or-28-03-1036" ref-type="bibr">37</xref>). Its differential expression was observed in ovarian cancer (<xref rid="b38-or-28-03-1036" ref-type="bibr">38</xref>). However, there is no report associating WDR77 with colon cancer. Thus, WDR77 is a novel potential biomarker of colon cancer. Similarly to WDR77, MYL9 has not been well-documented as being functionally associated with human cancer, including colon cancer. Therefore we reconfirmed its expression levels both at the RNA and protein levels by PCR and IHC methods, respectively.</p>
<p>In summary, we used an RF-based method to process a differential gene expression profile of colon cancer and selected 4 featured genes as candidate biomarkers of colon cancer. We validated these biomarkers in clinical colon cancer samples by a real-time PCR method. Our results showed that this approach filtered out genes of great importance, like IL8 and VIP based on microarray data, also including some new genes as WDR77 and MYL9 with the potential to act as cancer-related biomarkers.</p></sec></body>
<back>
<ack>
<title>Acknowledgements</title>
<p>This research was supported by the National Nature Science Foundation of China (no. 61075110) and the Scientific Plan of Beijing Municipal Commission of Education (JC002011200903). The authors wish to acknowledge Mr. Zhikun Gao (Beijing University of Technology) for providing assistance in processing the data with a machine learning algorithm.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-or-28-03-1036"><label>1</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>West</surname><given-names>NP</given-names></name><name><surname>Morris</surname><given-names>EJ</given-names></name><name><surname>Rotimi</surname><given-names>O</given-names></name><name><surname>Cairns</surname><given-names>A</given-names></name><name><surname>Finan</surname><given-names>PJ</given-names></name><name><surname>Quirke</surname><given-names>P</given-names></name></person-group><article-title>Pathology grading of colon cancer surgical resection and its association with survival: a retrospective observational study</article-title><source>Lancet Oncol</source><volume>9</volume><fpage>857</fpage><lpage>865</lpage><year>2008</year></element-citation></ref>
<ref id="b2-or-28-03-1036"><label>2</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jemal</surname><given-names>A</given-names></name><name><surname>Siegel</surname><given-names>R</given-names></name><name><surname>Xu</surname><given-names>J</given-names></name><name><surname>Ward</surname><given-names>E</given-names></name></person-group><article-title>Cancer statistics, 2010</article-title><source>CA Cancer J Clin</source><volume>60</volume><fpage>277</fpage><lpage>300</lpage><year>2010</year></element-citation></ref>
<ref id="b3-or-28-03-1036"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kosari</surname><given-names>F</given-names></name><name><surname>Parker</surname><given-names>AS</given-names></name><name><surname>Kube</surname><given-names>DM</given-names></name><name><surname>Lohse</surname><given-names>CM</given-names></name><name><surname>Leibovich</surname><given-names>BC</given-names></name><name><surname>Blute</surname><given-names>ML</given-names></name><name><surname>Cheville</surname><given-names>JC</given-names></name><name><surname>Vasmatzis</surname><given-names>G</given-names></name></person-group><article-title>Clear cell renal cell carcinoma: gene expression analyses identify a potential signature for tumor aggressiveness</article-title><source>Clin Cancer Res</source><volume>11</volume><fpage>5128</fpage><lpage>5139</lpage><year>2005</year></element-citation></ref>
<ref id="b4-or-28-03-1036"><label>4</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>X</given-names></name><name><surname>Yan</surname><given-names>Z</given-names></name><name><surname>Zhang</surname><given-names>J</given-names></name><name><surname>Gong</surname><given-names>L</given-names></name><name><surname>Li</surname><given-names>W</given-names></name><name><surname>Cui</surname><given-names>J</given-names></name><name><surname>Liu</surname><given-names>Y</given-names></name><name><surname>Gao</surname><given-names>Z</given-names></name><name><surname>Li</surname><given-names>J</given-names></name><name><surname>Shen</surname><given-names>L</given-names></name><name><surname>Lu</surname><given-names>Y</given-names></name></person-group><article-title>Combination of hsa-miR-375 and hsa-miR-142&#x02013;5p as a predictor for recurrence risk in gastric cancer patients following surgical resection</article-title><source>Ann Oncol</source><volume>22</volume><fpage>2257</fpage><lpage>2266</lpage><year>2011</year></element-citation></ref>
<ref id="b5-or-28-03-1036"><label>5</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>John</surname><given-names>Q</given-names></name></person-group><article-title>Microarray analysis and tumor classification</article-title><source>N Engl J Med</source><volume>354</volume><fpage>2463</fpage><lpage>2472</lpage><year>2006</year></element-citation></ref>
<ref id="b6-or-28-03-1036"><label>6</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname><given-names>GP</given-names></name><name><surname>Chan</surname><given-names>KC</given-names></name><name><surname>Wong</surname><given-names>AK</given-names></name></person-group><article-title>Unsupervised fuzzy pattern discovery in gene expression data</article-title><source>BMC Bioinformatics</source><volume>12</volume><issue>Suppl 5</issue><fpage>S5</fpage><year>2011</year></element-citation></ref>
<ref id="b7-or-28-03-1036"><label>7</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Broom</surname><given-names>BM</given-names></name><name><surname>Sulman</surname><given-names>EP</given-names></name><name><surname>Do</surname><given-names>KA</given-names></name><name><surname>Edgerton</surname><given-names>ME</given-names></name><name><surname>Aldape</surname><given-names>KD</given-names></name></person-group><article-title>Bagged gene shaving for the robust clustering of high-throughput data</article-title><source>Int J Bioinform Res Appl</source><volume>6</volume><fpage>326</fpage><lpage>343</lpage><year>2010</year></element-citation></ref>
<ref id="b8-or-28-03-1036"><label>8</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pique-Regi</surname><given-names>R</given-names></name><name><surname>Monso-Varona</surname><given-names>J</given-names></name><name><surname>Ortega</surname><given-names>A</given-names></name><name><surname>Seeger</surname><given-names>RC</given-names></name><name><surname>Triche</surname><given-names>TJ</given-names></name><name><surname>Asgharzadeh</surname><given-names>S</given-names></name></person-group><article-title>Sparse representation and Bayesian detection of genome copy number alterations from microarray data</article-title><source>Bioinformatics</source><volume>24</volume><fpage>309</fpage><lpage>318</lpage><year>2008</year></element-citation></ref>
<ref id="b9-or-28-03-1036"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>L</given-names></name><name><surname>Xuan</surname><given-names>J</given-names></name><name><surname>Riggins</surname><given-names>RB</given-names></name><name><surname>Clarke</surname><given-names>R</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name></person-group><article-title>Identifying cancer biomarkers by network-constrained support vector machines</article-title><source>BMC Syst Biol</source><volume>5</volume><fpage>161</fpage><year>2011</year></element-citation></ref>
<ref id="b10-or-28-03-1036"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Statnikov</surname><given-names>A</given-names></name><name><surname>Wang</surname><given-names>L</given-names></name><name><surname>Aliferis</surname><given-names>CF</given-names></name></person-group><article-title>A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification</article-title><source>BMC Bioinformatics</source><volume>9</volume><fpage>319</fpage><year>2008</year></element-citation></ref>
<ref id="b11-or-28-03-1036"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Manilich</surname><given-names>EA</given-names></name><name><surname>&#x000D6;zsoyo&#x0011F;lu</surname><given-names>ZM</given-names></name><name><surname>Trubachev</surname><given-names>V</given-names></name><name><surname>Radivoyevitch</surname><given-names>T</given-names></name></person-group><article-title>Classification of large microarray datasets using fast random forest construction</article-title><source>J Bioinform Comput Biol</source><volume>9</volume><fpage>251</fpage><lpage>267</lpage><year>2011</year></element-citation></ref>
<ref id="b12-or-28-03-1036"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Qi</surname><given-names>Y</given-names></name><name><surname>Bar-Joseph</surname></name><name><surname>Klein-Seetharaman</surname><given-names>J</given-names></name></person-group><article-title>Evaluation of different biological data and computational classification methods for use in protein interaction prediction</article-title><source>Proteins</source><volume>63</volume><fpage>490</fpage><lpage>500</lpage><year>2006</year></element-citation></ref>
<ref id="b13-or-28-03-1036"><label>13</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ramon</surname><given-names>DU</given-names></name><name><surname>Sara</surname><given-names>AA</given-names></name></person-group><article-title>Gene selection and classification of microarray data using random forest</article-title><source>BMC Bioinformatics</source><volume>7</volume><fpage>3</fpage><year>2006</year></element-citation></ref>
<ref id="b14-or-28-03-1036"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cutler</surname><given-names>A</given-names></name><name><surname>Stevens</surname><given-names>JR</given-names></name></person-group><article-title>Random forests for microarrays</article-title><source>Methods Enzymol</source><volume>411</volume><fpage>422</fpage><lpage>432</lpage><year>2006</year></element-citation></ref>
<ref id="b15-or-28-03-1036"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alon</surname><given-names>U</given-names></name><name><surname>Rarkai</surname><given-names>N</given-names></name><name><surname>Notterman</surname><given-names>DA</given-names></name><name><surname>Gish</surname><given-names>K</given-names></name><name><surname>Ybarra</surname><given-names>S</given-names></name><name><surname>Mack</surname><given-names>D</given-names></name><name><surname>Levine</surname><given-names>AJ</given-names></name></person-group><article-title>Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays</article-title><source>Proc Natl Acad Sci USA</source><volume>96</volume><fpage>6745</fpage><lpage>6750</lpage><year>1999</year></element-citation></ref>
<ref id="b16-or-28-03-1036"><label>16</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Strobl</surname><given-names>C</given-names></name><name><surname>Boulesteix</surname><given-names>AL</given-names></name><name><surname>Zeileis</surname><given-names>A</given-names></name><name><surname>Hothorn</surname><given-names>T</given-names></name></person-group><article-title>Bias in random forest variable importance measures: Illustrations, sources and a solution</article-title><source>BMC Bioinformatics</source><volume>8</volume><fpage>25</fpage><year>2007</year></element-citation></ref>
<ref id="b17-or-28-03-1036"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kenneth</surname><given-names>JL</given-names></name><name><surname>Thomas</surname><given-names>DS</given-names></name></person-group><article-title>Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) method</article-title><source>Methods</source><volume>25</volume><fpage>402</fpage><lpage>408</lpage><year>2001</year></element-citation></ref>
<ref id="b18-or-28-03-1036"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alves</surname><given-names>PM</given-names></name><name><surname>L&#x000E9;vy</surname><given-names>N</given-names></name><name><surname>Stevenson</surname><given-names>BJ</given-names></name><name><surname>Bouzourene</surname><given-names>H</given-names></name><name><surname>Theiler</surname><given-names>G</given-names></name><name><surname>Bricard</surname><given-names>G</given-names></name><name><surname>Viatte</surname><given-names>S</given-names></name><name><surname>Ayyoub</surname><given-names>M</given-names></name><name><surname>Vuilleumier</surname><given-names>H</given-names></name><name><surname>Givel</surname><given-names>JC</given-names></name><etal/></person-group><article-title>Identification of tumor-associated antigens by large-scale analysis of genes expressed in human colorectal cancer</article-title><source>Cancer Immun</source><volume>8</volume><fpage>11</fpage><year>2008</year></element-citation></ref>
<ref id="b19-or-28-03-1036"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schetter</surname><given-names>AJ</given-names></name><name><surname>Leung</surname><given-names>SY</given-names></name><name><surname>Sohn</surname><given-names>JJ</given-names></name><name><surname>Zanetti</surname><given-names>KA</given-names></name><name><surname>Bowman</surname><given-names>ED</given-names></name><name><surname>Yanaihara</surname><given-names>N</given-names></name><name><surname>Yuen</surname><given-names>ST</given-names></name><name><surname>Chan</surname><given-names>TL</given-names></name><name><surname>Kwong</surname><given-names>DL</given-names></name><name><surname>Au</surname><given-names>GK</given-names></name><etal/></person-group><article-title>MicroRNA expression profiles associated with prognosis and therapeutic outcome in colon adenocarcinoma</article-title><source>JAMA</source><volume>299</volume><fpage>425</fpage><lpage>436</lpage><year>2008</year></element-citation></ref>
<ref id="b20-or-28-03-1036"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chung</surname><given-names>W</given-names></name><name><surname>Kwabi-Addo</surname><given-names>B</given-names></name><name><surname>Ittmann</surname><given-names>M</given-names></name><name><surname>Jelinek</surname><given-names>J</given-names></name><name><surname>Shen</surname><given-names>L</given-names></name><name><surname>Yu</surname><given-names>Y</given-names></name><name><surname>Issa</surname><given-names>JP</given-names></name></person-group><article-title>Identification of novel tumor markers in prostate, colon and breast cancer by unbiased methylation profiling</article-title><source>PLoS One</source><volume>3</volume><fpage>2079</fpage><year>2008</year></element-citation></ref>
<ref id="b21-or-28-03-1036"><label>21</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nicodemus</surname><given-names>KK</given-names></name><name><surname>Wang</surname><given-names>W</given-names></name><name><surname>Shugart</surname><given-names>YY</given-names></name></person-group><article-title>Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene x gene and gene x environment interactions</article-title><source>BMC Proc</source><volume>1</volume><issue>Suppl 1</issue><fpage>S58</fpage><year>2007</year></element-citation></ref>
<ref id="b22-or-28-03-1036"><label>22</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pang</surname><given-names>H</given-names></name><name><surname>Zhao</surname><given-names>H</given-names></name></person-group><article-title>Building pathway clusters from Random Forests classification using class votes</article-title><source>BMC Bioinformatics</source><volume>9</volume><fpage>87</fpage><year>2008</year></element-citation></ref>
<ref id="b23-or-28-03-1036"><label>23</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>de la Iglesia</surname><given-names>N</given-names></name><name><surname>Konopka</surname><given-names>G</given-names></name><name><surname>Lim</surname><given-names>KL</given-names></name><name><surname>Nutt</surname><given-names>CL</given-names></name><name><surname>Bromberg</surname><given-names>JF</given-names></name><name><surname>Frank</surname><given-names>DA</given-names></name><name><surname>Mischel</surname><given-names>PS</given-names></name><name><surname>Louis</surname><given-names>DN</given-names></name><name><surname>Bonni</surname><given-names>A</given-names></name></person-group><article-title>Deregulation of a STAT3-interleukin 8 signaling pathway promotes human glioblastoma cell proliferation and invasiveness</article-title><source>J Neurosci</source><volume>28</volume><fpage>5870</fpage><lpage>5878</lpage><year>2008</year></element-citation></ref>
<ref id="b24-or-28-03-1036"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Canedo</surname><given-names>P</given-names></name><name><surname>Castanheira-Vale</surname><given-names>AJ</given-names></name><name><surname>Lunet</surname><given-names>N</given-names></name><name><surname>Pereira</surname><given-names>F</given-names></name><name><surname>Figueiredo</surname><given-names>C</given-names></name><name><surname>Gioia-Patricola</surname><given-names>L</given-names></name><name><surname>Canzian</surname><given-names>F</given-names></name><name><surname>Moreira</surname><given-names>H</given-names></name><name><surname>Suriano</surname><given-names>G</given-names></name><name><surname>Barros</surname><given-names>H</given-names></name><etal/></person-group><article-title>The interleukin-8&#x02013;251<sup>&#x0002A;</sup>T/<sup>&#x0002A;</sup>A polymorphism is not associated with risk for gastric carcinoma development in a Portuguese population</article-title><source>Eur J Cancer Prev</source><volume>17</volume><fpage>28</fpage><lpage>32</lpage><year>2008</year></element-citation></ref>
<ref id="b25-or-28-03-1036"><label>25</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yoshida</surname><given-names>C</given-names></name><name><surname>Niiya</surname><given-names>K</given-names></name><name><surname>Niiya</surname><given-names>M</given-names></name><name><surname>Shibakura</surname><given-names>M</given-names></name><name><surname>Asaumi</surname><given-names>N</given-names></name><name><surname>Tanimoto</surname><given-names>M</given-names></name></person-group><article-title>Induction of urokinase-type plasminogen activator, interleukin-8 and early growth response-1 by STI571 through activating mitogen activated protein kinase in human small cell lung cancer cells</article-title><source>Blood Coagul Fibrinolysis</source><volume>18</volume><fpage>425</fpage><lpage>433</lpage><year>2007</year></element-citation></ref>
<ref id="b26-or-28-03-1036"><label>26</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Araki</surname><given-names>S</given-names></name><name><surname>Omori</surname><given-names>Y</given-names></name><name><surname>Lyn</surname><given-names>D</given-names></name><name><surname>Singh</surname><given-names>RK</given-names></name><name><surname>Meinbach</surname><given-names>DM</given-names></name><name><surname>Sandman</surname><given-names>Y</given-names></name><name><surname>Lokeshwar</surname><given-names>VB</given-names></name><name><surname>Lokeshwar</surname><given-names>BL</given-names></name></person-group><article-title>Interleukin-8 is a molecular determinant of androgen independence and progression in prostate cancer</article-title><source>Cancer Res</source><volume>67</volume><fpage>6854</fpage><lpage>6862</lpage><year>2007</year></element-citation></ref>
<ref id="b27-or-28-03-1036"><label>27</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Savage</surname><given-names>SA</given-names></name><name><surname>Abnet</surname><given-names>CC</given-names></name><name><surname>Mark</surname><given-names>SD</given-names></name><name><surname>Qiao</surname><given-names>YL</given-names></name><name><surname>Dong</surname><given-names>ZW</given-names></name><name><surname>Dawsey</surname><given-names>SM</given-names></name><name><surname>Taylor</surname><given-names>PR</given-names></name><name><surname>Chanock</surname><given-names>SJ</given-names></name></person-group><article-title>Variants of the IL8 and IL8RB genes and risk for gastric cardia adenocarcinoma and esophageal squamous cell carcinoma</article-title><source>Cancer Epidemiol Biomarkers Prev</source><volume>13</volume><fpage>2251</fpage><lpage>2257</lpage><year>2004</year></element-citation></ref>
<ref id="b28-or-28-03-1036"><label>28</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bruserud</surname><given-names>&#x000D8;</given-names></name><name><surname>Ryningen</surname><given-names>A</given-names></name><name><surname>Wergeland</surname><given-names>L</given-names></name><name><surname>Glenjen</surname><given-names>NI</given-names></name><name><surname>Gjertsen</surname><given-names>BT</given-names></name></person-group><article-title>Osteoblasts increase proliferation and release of pro-angiogenic interleukin 8 by native human acute myelogenous leukemia blasts</article-title><source>Haematologica</source><volume>89</volume><fpage>391</fpage><lpage>402</lpage><year>2004</year></element-citation></ref>
<ref id="b29-or-28-03-1036"><label>29</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Landi</surname><given-names>S</given-names></name><name><surname>Moreno</surname><given-names>V</given-names></name><name><surname>Gioia-Patricola</surname><given-names>L</given-names></name><name><surname>Guino</surname><given-names>E</given-names></name><name><surname>Navarro</surname><given-names>M</given-names></name><name><surname>de Oca</surname><given-names>J</given-names></name><name><surname>Capella</surname><given-names>G</given-names></name><name><surname>Canzian</surname><given-names>F</given-names></name></person-group><collab>Bellvitge Colorectal Cancer Study Group</collab><article-title>Association of common polymorphisms in inflammatory genes interleukin (IL)6, IL8, tumor necrosis factor alpha, NFKB1, and peroxisome proliferator-activated receptor gamma with colorectal cancer</article-title><source>Cancer Res</source><volume>63</volume><fpage>3560</fpage><lpage>3566</lpage><year>2003</year></element-citation></ref>
<ref id="b30-or-28-03-1036"><label>30</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ning</surname><given-names>Y</given-names></name><name><surname>Manegold</surname><given-names>PC</given-names></name><name><surname>Hong</surname><given-names>YK</given-names></name><name><surname>Zhang</surname><given-names>W</given-names></name><name><surname>Pohl</surname><given-names>A</given-names></name><name><surname>Lurje</surname><given-names>G</given-names></name><name><surname>Winder</surname><given-names>T</given-names></name><name><surname>Yang</surname><given-names>D</given-names></name><name><surname>LaBonte</surname><given-names>MJ</given-names></name><name><surname>Wilson</surname><given-names>PM</given-names></name><etal/></person-group><article-title>Interleukin-8 is associated with proliferation, migration, angiogenesis and chemosensitivity in vitro and in vivo in colon cancer cell line models</article-title><source>Int J Cancer</source><volume>128</volume><fpage>2038</fpage><lpage>2049</lpage><year>2011</year></element-citation></ref>
<ref id="b31-or-28-03-1036"><label>31</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ogasawara</surname><given-names>M</given-names></name><name><surname>Murata</surname><given-names>J</given-names></name><name><surname>Ayukawa</surname><given-names>K</given-names></name><name><surname>Saiki</surname><given-names>I</given-names></name></person-group><article-title>Differential effect of intestinal neuropeptides on invasion and migration of colon carcinoma cells in vitro</article-title><source>Cancer Lett</source><volume>119</volume><fpage>125</fpage><lpage>130</lpage><year>1997</year></element-citation></ref>
<ref id="b32-or-28-03-1036"><label>32</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Singh</surname><given-names>AT</given-names></name><name><surname>Jaggi</surname><given-names>M</given-names></name><name><surname>Prasad</surname><given-names>S</given-names></name><name><surname>Dutt</surname><given-names>S</given-names></name><name><surname>Singh</surname><given-names>G</given-names></name><name><surname>Datta</surname><given-names>K</given-names></name><name><surname>Rajendran</surname><given-names>P</given-names></name><name><surname>Sanna</surname><given-names>VK</given-names></name><name><surname>Mukherjee</surname><given-names>R</given-names></name><name><surname>Burman</surname><given-names>AC</given-names></name></person-group><article-title>Modulation of key signal transduction molecules by a novel peptide combination effective for the treatment of gastrointestinal carcinomas</article-title><source>Invest New Drugs</source><volume>26</volume><fpage>505</fpage><lpage>516</lpage><year>2008</year></element-citation></ref>
<ref id="b33-or-28-03-1036"><label>33</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Valdehita</surname><given-names>A</given-names></name><name><surname>Carmena</surname><given-names>MJ</given-names></name><name><surname>Collado</surname><given-names>B</given-names></name><name><surname>Prieto</surname><given-names>JC</given-names></name><name><surname>Bajo</surname><given-names>AM</given-names></name></person-group><article-title>Vasoactive intestinal peptide (VIP) increases vascular endothelial growth factor (VEGF) expression and secretion in human breast cancer cells</article-title><source>Regul Pept</source><volume>144</volume><fpage>101</fpage><lpage>108</lpage><year>2007</year></element-citation></ref>
<ref id="b34-or-28-03-1036"><label>34</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Haberl</surname><given-names>I</given-names></name><name><surname>Frei</surname><given-names>K</given-names></name><name><surname>Ramsebner</surname><given-names>R</given-names></name><name><surname>Doberer</surname><given-names>D</given-names></name><name><surname>Petkov</surname><given-names>V</given-names></name><name><surname>Albinni</surname><given-names>S</given-names></name><name><surname>Lang</surname><given-names>I</given-names></name><name><surname>Lucas</surname><given-names>T</given-names></name><name><surname>Mosgoeller</surname><given-names>W</given-names></name></person-group><article-title>Vasoactive intestinal peptide gene alterations in patients with idiopathic pulmonary arterial hypertension</article-title><source>Eur J Hum Genet</source><volume>15</volume><fpage>18</fpage><lpage>22</lpage><year>2007</year></element-citation></ref>
<ref id="b35-or-28-03-1036"><label>35</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Absood</surname><given-names>A</given-names></name><name><surname>Hu</surname><given-names>B</given-names></name><name><surname>Bassily</surname><given-names>N</given-names></name><name><surname>Colletti</surname><given-names>L</given-names></name></person-group><article-title>VIP inhibits human HepG2 cell proliferation in vitro</article-title><source>Regul Pept</source><volume>146</volume><fpage>285</fpage><lpage>292</lpage><year>2008</year></element-citation></ref>
<ref id="b36-or-28-03-1036"><label>36</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Collado</surname><given-names>B</given-names></name><name><surname>S&#x000E1;nchez-Chapado</surname><given-names>M</given-names></name><name><surname>Prieto</surname><given-names>JC</given-names></name><name><surname>Carmena</surname><given-names>MJ</given-names></name></person-group><article-title>Hypoxia regulation of expression and angiogenic effects of vasoactive intestinal peptide (VIP) and VIP receptors in LNCaP prostate cancer cells</article-title><source>Mol Cell Endocrinol</source><volume>249</volume><fpage>116</fpage><lpage>122</lpage><year>2006</year></element-citation></ref>
<ref id="b37-or-28-03-1036"><label>37</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname><given-names>S</given-names></name><name><surname>Wu</surname><given-names>H</given-names></name><name><surname>Wang</surname><given-names>F</given-names></name><name><surname>Wang</surname><given-names>Z</given-names></name></person-group><article-title>Altered differentiation and proliferation of prostate epithelium in mice lacking the androgen receptor cofactor p44/WDR77</article-title><source>Endocrinology</source><volume>151</volume><fpage>3941</fpage><lpage>3953</lpage><year>2010</year></element-citation></ref>
<ref id="b38-or-28-03-1036"><label>38</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ligr</surname><given-names>M</given-names></name><name><surname>Patwa</surname><given-names>RR</given-names></name><name><surname>Daniels</surname><given-names>G</given-names></name><name><surname>Pan</surname><given-names>L</given-names></name><name><surname>Wu</surname><given-names>X</given-names></name><name><surname>Li</surname><given-names>Y</given-names></name><name><surname>Tian</surname><given-names>L</given-names></name><name><surname>Wang</surname><given-names>Z</given-names></name><name><surname>Xu</surname><given-names>R</given-names></name><name><surname>Wu</surname><given-names>J</given-names></name><etal/></person-group><article-title>Expression and function of androgen receptor coactivator p44/Mep50/WDR77 in ovarian cancer</article-title><source>PLoS One</source><volume>6</volume><fpage>e26250</fpage><year>2011</year></element-citation></ref></ref-list></back>
<floats-group>
<fig id="f1-or-28-03-1036" position="float">
<label>Figure 1</label>
<caption>
<p>Identification of 4 genes by RF based on gene expression profiling. (A) The expression levels showed that IL8 and WDR77 were downregulated in 22 normal samples, and upregulated in tumor samples; MYL9 and VIP were downregulated in 40 tumor samples and upregulated in normal samples. (B) The smallest OOB error rate appeared when there were only 4 genes. The numbers of reserved genes were 320, 290, 260, 230, 200, 170, 140, 110, 80, 50, 20, 10, 5, 4, 3, 2 and 1. (C) Graphical overview of these 4 genes. Hierarchical clustering of the data matrix consists of 4 differential expressed genes by 40 colon cancers and 22 matched normal tissues. Columns represent samples and rows represent genes (black, green, and red correspond to no-change, downregulated and upregulated, respectively). T, tumor; N, normal.</p></caption>
<graphic xlink:href="OR-28-03-1036-g04.gif"/></fig>
<fig id="f2-or-28-03-1036" position="float">
<label>Figure 2</label>
<caption>
<p>Relative expression levels of the candidate biomarkers validated by real-time PCR. (A) The 2<sup>-&#x00394;&#x00394;Ct</sup> method was used to analysis the relative expression levels of the genes after real-time PCR. Quantitative real-time PCR results showed that IL8 and WDR77 were upregulated in colon cancer samples with P-values of 0.032 and 0.046; MYL9 and VIP were downregulated in colon cancer samples with the P-values of 0.028 and 0.177. (B) The same results were shown by semi-quantitative PCR.</p></caption>
<graphic xlink:href="OR-28-03-1036-g05.gif"/></fig>
<fig id="f3-or-28-03-1036" position="float">
<label>Figure 3</label>
<caption>
<p>ROC curve analysis of candidate biomarkers. (A) ROC analysis based on microarray data. The AUC values of IL8, WDR77, MYL9 and VIP were 0.853, 0.875, 0.826 and 0.812 in the training group; (B) ROC analysis based on real-time PCR data (Ct value). The AUC values of IL8, WDR77, MYL9 and VIP were 0.869, 0.867, 0.898 and 0.845 in the test group, respectively.</p></caption>
<graphic xlink:href="OR-28-03-1036-g06.gif"/></fig>
<table-wrap id="tI-or-28-03-1036" position="float">
<label>Table I</label>
<caption>
<p>Top 20 genes with high classification score by the random forrest algorithm.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">GenBank ID</th>
<th align="center" valign="top">Accession no.</th>
<th align="center" valign="top">Gene symbol</th>
<th align="center" valign="top">Score</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>M26383</bold></td>
<td align="left" valign="top"><bold>NM_000584</bold></td>
<td align="left" valign="top"><bold>IL8</bold></td>
<td align="left" valign="top"><bold>0.8776</bold></td></tr>
<tr>
<td align="left" valign="top"><bold>H08393</bold></td>
<td align="left" valign="top"><bold>NM_024102</bold></td>
<td align="left" valign="top"><bold>WDR77</bold></td>
<td align="left" valign="top"><bold>0.8520</bold></td></tr>
<tr>
<td align="left" valign="top"><bold>J02854</bold></td>
<td align="left" valign="top"><bold>BM473095</bold></td>
<td align="left" valign="top"><bold>MYL9</bold></td>
<td align="left" valign="top"><bold>0.8263</bold></td></tr>
<tr>
<td align="left" valign="top"><bold>M36634</bold></td>
<td align="left" valign="top"><bold>NM_003381</bold></td>
<td align="left" valign="top"><bold>VIP</bold></td>
<td align="left" valign="top"><bold>0.8124</bold></td></tr>
<tr>
<td align="left" valign="top">J05032</td>
<td align="left" valign="top">NM_001349</td>
<td align="left" valign="top">DARS</td>
<td align="left" valign="top">0.8108</td></tr>
<tr>
<td align="left" valign="top">T92451</td>
<td align="left" valign="top">CR590682</td>
<td align="left" valign="top">TPM2</td>
<td align="left" valign="top">0.8065</td></tr>
<tr>
<td align="left" valign="top">R36977</td>
<td align="left" valign="top">AK057993</td>
<td align="left" valign="top">GTF3A</td>
<td align="left" valign="top">0.8065</td></tr>
<tr>
<td align="left" valign="top">M22382</td>
<td align="left" valign="top">BC047350</td>
<td align="left" valign="top">HSPD1</td>
<td align="left" valign="top">0.8065</td></tr>
<tr>
<td align="left" valign="top">U25138</td>
<td align="left" valign="top">BC025707</td>
<td align="left" valign="top">KCNMB1</td>
<td align="left" valign="top">0.8065</td></tr>
<tr>
<td align="left" valign="top">D00860</td>
<td align="left" valign="top">NM_002764</td>
<td align="left" valign="top">PRPS1</td>
<td align="left" valign="top">0.8007</td></tr>
<tr>
<td align="left" valign="top">H43887</td>
<td align="left" valign="top">BQ712715</td>
<td align="left" valign="top">CFD</td>
<td align="left" valign="top">0.8007</td></tr>
<tr>
<td align="left" valign="top">X63629</td>
<td align="left" valign="top">NM_001793</td>
<td align="left" valign="top">CDH3</td>
<td align="left" valign="top">0.8007</td></tr>
<tr>
<td align="left" valign="top">T51571</td>
<td align="left" valign="top">BQ683841</td>
<td align="left" valign="top">S100A11</td>
<td align="left" valign="top">0.7963</td></tr>
<tr>
<td align="left" valign="top">Z50753</td>
<td align="left" valign="top">NM_007102</td>
<td align="left" valign="top">GUCA2B</td>
<td align="left" valign="top">0.7963</td></tr>
<tr>
<td align="left" valign="top">T96873</td>
<td align="left" valign="top">CR627338</td>
<td align="left" valign="top">CBWD1</td>
<td align="left" valign="top">0.7786</td></tr>
<tr>
<td align="left" valign="top">H64489</td>
<td align="left" valign="top">NM_005727</td>
<td align="left" valign="top">TSPAN1</td>
<td align="left" valign="top">0.7786</td></tr>
<tr>
<td align="left" valign="top">T60155</td>
<td align="left" valign="top">BX647362</td>
<td align="left" valign="top">ACTA2</td>
<td align="left" valign="top">0.7786</td></tr>
<tr>
<td align="left" valign="top">D14812</td>
<td align="left" valign="top">BC035249</td>
<td align="left" valign="top">MORF4L2</td>
<td align="left" valign="top">0.7786</td></tr>
<tr>
<td align="left" valign="top">T54303</td>
<td align="left" valign="top">CR607281</td>
<td align="left" valign="top">KRT8</td>
<td align="left" valign="top">0.7692</td></tr>
<tr>
<td align="left" valign="top">L41559</td>
<td align="left" valign="top">BM550965</td>
<td align="left" valign="top">PCBD1</td>
<td align="left" valign="top">0.7692</td></tr></tbody></table>
<table-wrap-foot><fn id="tfn1-or-28-03-1036">
<p>Bold indicates the genes selected as a classifier of colon cancer.</p></fn></table-wrap-foot></table-wrap>
<table-wrap id="tII-or-28-03-1036" position="float">
<label>Table II</label>
<caption>
<p>Four genes selected by the random forest method based on microarray data.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Expression level</th>
<th align="center" valign="top">Gene symbol</th>
<th align="center" valign="top">GenBank ID</th>
<th align="center" valign="top">Fold-change</th>
<th align="center" valign="top">Q-value (&#x00025;)</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">Upregulated</td>
<td align="left" valign="top">IL8</td>
<td align="left" valign="top">M26383</td>
<td align="left" valign="top">2.444</td>
<td align="center" valign="top">0.665</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">WDR77</td>
<td align="left" valign="top">H08393</td>
<td align="left" valign="top">1.638</td>
<td align="center" valign="top">3.547</td></tr>
<tr>
<td align="left" valign="top">Downregulated</td>
<td align="left" valign="top">MYL9</td>
<td align="left" valign="top">J02854</td>
<td align="left" valign="top">0.011</td>
<td align="center" valign="top">0</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">VIP</td>
<td align="left" valign="top">M36634</td>
<td align="left" valign="top">0.203</td>
<td align="center" valign="top">0</td></tr></tbody></table></table-wrap>
<table-wrap id="tIII-or-28-03-1036" position="float">
<label>Table III</label>
<caption>
<p>ROC analyses of the sensitivity and specificity of candidate biomarkers.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Sample sets</th>
<th align="center" valign="top">Biomarkers</th>
<th align="center" valign="top">Sensitivity (&#x00025;)</th>
<th align="center" valign="top">Specificity (&#x00025;)</th>
<th align="center" valign="top">AUC</th>
<th align="center" valign="top">95&#x00025; CI</th>
<th align="center" valign="top">SE</th>
<th align="center" valign="top">P-value</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">Training samples</td>
<td align="left" valign="top">IL8</td>
<td align="right" valign="top">100.0</td>
<td align="center" valign="top">63.6</td>
<td align="center" valign="top">0.853</td>
<td align="center" valign="top">0.740&#x02013;0.930</td>
<td align="center" valign="top">0.0472</td>
<td align="center" valign="top">0.0001</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">WDR77</td>
<td align="right" valign="top">70.0</td>
<td align="center" valign="top">95.5</td>
<td align="center" valign="top">0.875</td>
<td align="center" valign="top">0.766&#x02013;0.945</td>
<td align="center" valign="top">0.0434</td>
<td align="center" valign="top">0.0001</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">MYL9</td>
<td align="right" valign="top">90.0</td>
<td align="center" valign="top">68.2</td>
<td align="center" valign="top">0.826</td>
<td align="center" valign="top">0.709&#x02013;0.910</td>
<td align="center" valign="top">0.0596</td>
<td align="center" valign="top">0.0001</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">VIP</td>
<td align="right" valign="top">75.0</td>
<td align="center" valign="top">77.3</td>
<td align="center" valign="top">0.812</td>
<td align="center" valign="top">0.693&#x02013;0.900</td>
<td align="center" valign="top">0.0614</td>
<td align="center" valign="top">0.0001</td></tr>
<tr>
<td align="left" valign="top">Test samples</td>
<td align="left" valign="top">IL8</td>
<td align="right" valign="top">93.8</td>
<td align="center" valign="top">77.1</td>
<td align="center" valign="top">0.869</td>
<td align="center" valign="top">0.785&#x02013;0.929</td>
<td align="center" valign="top">0.0374</td>
<td align="center" valign="top">0.0001</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">WDR77</td>
<td align="right" valign="top">81.2</td>
<td align="center" valign="top">83.3</td>
<td align="center" valign="top">0.867</td>
<td align="center" valign="top">0.782&#x02013;0.928</td>
<td align="center" valign="top">0.0377</td>
<td align="center" valign="top">0.0001</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">MYL9</td>
<td align="right" valign="top">91.7</td>
<td align="center" valign="top">79.2</td>
<td align="center" valign="top">0.898</td>
<td align="center" valign="top">0.820&#x02013;0.951</td>
<td align="center" valign="top">0.0330</td>
<td align="center" valign="top">0.0001</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">VIP</td>
<td align="right" valign="top">72.9</td>
<td align="center" valign="top">87.5</td>
<td align="center" valign="top">0.845</td>
<td align="center" valign="top">0.757&#x02013;0.911</td>
<td align="center" valign="top">0.0405</td>
<td align="center" valign="top">0.0001</td></tr></tbody></table>
<table-wrap-foot><fn id="tfn2-or-28-03-1036">
<p>95&#x00025; CI, 95&#x00025; confidence interval; SE, standard error.</p></fn></table-wrap-foot></table-wrap>
<table-wrap id="tIV-or-28-03-1036" position="float">
<label>Table IV</label>
<caption>
<p>Statistical analyses of the biomarkers expression associated with the clinical significances of colon cancer patients.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top"/>
<th colspan="3" align="center" valign="top">IL8</th>
<th colspan="3" align="center" valign="top">WDR77</th>
<th colspan="3" align="center" valign="top">MYL9</th>
<th colspan="3" align="center" valign="top">VIP</th></tr>
<tr>
<th align="left" valign="top"/>
<th colspan="3" align="left" valign="top">
<hr/></th>
<th colspan="3" align="left" valign="top">
<hr/></th>
<th colspan="3" align="left" valign="top">
<hr/></th>
<th colspan="3" align="left" valign="top">
<hr/></th></tr>
<tr>
<th align="left" valign="top">Characteristics</th>
<th align="center" valign="top">(&#x0002B;) (n&#x0003D;37)</th>
<th align="center" valign="top">(&#x02212;) (n&#x0003D;11)</th>
<th align="center" valign="top">P-value</th>
<th align="center" valign="top">(&#x0002B;) (n&#x0003D;34)</th>
<th align="center" valign="top">(&#x02212;) (n&#x0003D;14)</th>
<th align="center" valign="top">P-value</th>
<th align="center" valign="top">(&#x0002B;) (n&#x0003D;13)</th>
<th align="center" valign="top">(&#x02212;) (n&#x0003D;35)</th>
<th align="center" valign="top">P-value</th>
<th align="center" valign="top">(&#x0002B;) (n&#x0003D;15)</th>
<th align="center" valign="top">(&#x02212;) (n&#x0003D;33)</th>
<th align="center" valign="top">P-value</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">Gender</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top"><bold>0.029</bold></td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.338</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.234</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top">0.133</td></tr>
<tr>
<td align="left" valign="top">&#x02003;Male</td>
<td align="right" valign="top">28</td>
<td align="center" valign="top">5</td>
<td align="right" valign="top"/>
<td align="center" valign="top">24</td>
<td align="center" valign="top">9</td>
<td align="center" valign="top"/>
<td align="center" valign="top">10</td>
<td align="center" valign="top">23</td>
<td align="center" valign="top"/>
<td align="center" valign="top">12</td>
<td align="right" valign="top">21</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;Female</td>
<td align="right" valign="top">9</td>
<td align="center" valign="top">6</td>
<td align="right" valign="top"/>
<td align="center" valign="top">10</td>
<td align="center" valign="top">5</td>
<td align="center" valign="top"/>
<td align="center" valign="top">3</td>
<td align="center" valign="top">12</td>
<td align="center" valign="top"/>
<td align="center" valign="top">3</td>
<td align="right" valign="top">12</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Age (years)</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top">0.082</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.063</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.101</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top">0.348</td></tr>
<tr>
<td align="left" valign="top">&#x02003;Median</td>
<td align="right" valign="top">64</td>
<td align="center" valign="top">51</td>
<td align="right" valign="top"/>
<td align="center" valign="top">64</td>
<td align="center" valign="top">51</td>
<td align="center" valign="top"/>
<td align="center" valign="top">58</td>
<td align="center" valign="top">62</td>
<td align="center" valign="top"/>
<td align="center" valign="top">64</td>
<td align="right" valign="top">58</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;Average</td>
<td align="right" valign="top">60</td>
<td align="center" valign="top">54</td>
<td align="right" valign="top"/>
<td align="center" valign="top">60</td>
<td align="center" valign="top">53</td>
<td align="center" valign="top"/>
<td align="center" valign="top">55</td>
<td align="center" valign="top">60</td>
<td align="center" valign="top"/>
<td align="center" valign="top">59</td>
<td align="right" valign="top">58</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Differentiation</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top">0.404</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.146</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.462</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top"><bold>0.026</bold></td></tr>
<tr>
<td align="left" valign="top">&#x02003;Poor</td>
<td align="right" valign="top">22</td>
<td align="center" valign="top">7</td>
<td align="right" valign="top"/>
<td align="center" valign="top">22</td>
<td align="center" valign="top">7</td>
<td align="center" valign="top"/>
<td align="center" valign="top">8</td>
<td align="center" valign="top">21</td>
<td align="center" valign="top"/>
<td align="center" valign="top">6</td>
<td align="right" valign="top">23</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;Moderate/well</td>
<td align="right" valign="top">15</td>
<td align="center" valign="top">4</td>
<td align="right" valign="top"/>
<td align="center" valign="top">12</td>
<td align="center" valign="top">7</td>
<td align="center" valign="top"/>
<td align="center" valign="top">5</td>
<td align="center" valign="top">14</td>
<td align="center" valign="top"/>
<td align="center" valign="top">9</td>
<td align="right" valign="top">10</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Lymph node resection</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top">0.351</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.298</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.066</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top">0.376</td></tr>
<tr>
<td align="left" valign="top">&#x02003;&lt;12</td>
<td align="right" valign="top">8</td>
<td align="center" valign="top">3</td>
<td align="right" valign="top"/>
<td align="center" valign="top">7</td>
<td align="center" valign="top">4</td>
<td align="center" valign="top"/>
<td align="center" valign="top">1</td>
<td align="center" valign="top">10</td>
<td align="center" valign="top"/>
<td align="center" valign="top">3</td>
<td align="right" valign="top">8</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;&gt;12</td>
<td align="right" valign="top">29</td>
<td align="center" valign="top">8</td>
<td align="right" valign="top"/>
<td align="center" valign="top">27</td>
<td align="center" valign="top">10</td>
<td align="center" valign="top"/>
<td align="center" valign="top">12</td>
<td align="center" valign="top">25</td>
<td align="center" valign="top"/>
<td align="center" valign="top">12</td>
<td align="right" valign="top">25</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Clinical stage</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top"><bold>&lt;0.001</bold></td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top"><bold>0.008</bold></td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.066</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top"><bold>0.037</bold></td></tr>
<tr>
<td align="left" valign="top">&#x02003;I/II</td>
<td align="right" valign="top">6</td>
<td align="center" valign="top">9</td>
<td align="right" valign="top"/>
<td align="center" valign="top">7</td>
<td align="center" valign="top">8</td>
<td align="center" valign="top"/>
<td align="center" valign="top">1</td>
<td align="center" valign="top">10</td>
<td align="center" valign="top"/>
<td align="center" valign="top">1</td>
<td align="right" valign="top">10</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;III/IV</td>
<td align="right" valign="top">31</td>
<td align="center" valign="top">2</td>
<td align="right" valign="top"/>
<td align="center" valign="top">27</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top"/>
<td align="center" valign="top">12</td>
<td align="center" valign="top">25</td>
<td align="center" valign="top"/>
<td align="center" valign="top">14</td>
<td align="right" valign="top">23</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Embolus</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top">0.086</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top"><bold>0.035</bold></td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.428</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top">0.108</td></tr>
<tr>
<td align="left" valign="top">&#x02003;With</td>
<td align="right" valign="top">11</td>
<td align="center" valign="top">1</td>
<td align="right" valign="top"/>
<td align="center" valign="top">11</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top"/>
<td align="center" valign="top">3</td>
<td align="center" valign="top">9</td>
<td align="center" valign="top"/>
<td align="center" valign="top">2</td>
<td align="right" valign="top">10</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;Without</td>
<td align="right" valign="top">26</td>
<td align="center" valign="top">10</td>
<td align="right" valign="top"/>
<td align="center" valign="top">23</td>
<td align="center" valign="top">13</td>
<td align="center" valign="top"/>
<td align="center" valign="top">10</td>
<td align="center" valign="top">26</td>
<td align="center" valign="top"/>
<td align="center" valign="top">13</td>
<td align="right" valign="top">23</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Adjuvant chemotherapy</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top">0.142</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.070</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.410</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top">0.054</td></tr>
<tr>
<td align="left" valign="top">&#x02003;Performed</td>
<td align="right" valign="top">9</td>
<td align="center" valign="top">1</td>
<td align="right" valign="top"/>
<td align="center" valign="top">9</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top"/>
<td align="center" valign="top">3</td>
<td align="center" valign="top">7</td>
<td align="center" valign="top"/>
<td align="center" valign="top">1</td>
<td align="right" valign="top">9</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;Not performed</td>
<td align="right" valign="top">28</td>
<td align="center" valign="top">10</td>
<td align="right" valign="top"/>
<td align="center" valign="top">25</td>
<td align="center" valign="top">13</td>
<td align="center" valign="top"/>
<td align="center" valign="top">10</td>
<td align="center" valign="top">28</td>
<td align="center" valign="top"/>
<td align="center" valign="top">14</td>
<td align="right" valign="top">24</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Recurrence</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top">0.479</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.097</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.120</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top"><bold>0.019</bold></td></tr>
<tr>
<td align="left" valign="top">&#x02003;Recurrence</td>
<td align="right" valign="top">7</td>
<td align="center" valign="top">2</td>
<td align="right" valign="top"/>
<td align="center" valign="top">8</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top"/>
<td align="center" valign="top">1</td>
<td align="center" valign="top">8</td>
<td align="center" valign="top"/>
<td align="center" valign="top">0</td>
<td align="right" valign="top">9</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;Non-recurrence</td>
<td align="right" valign="top">30</td>
<td align="center" valign="top">9</td>
<td align="right" valign="top"/>
<td align="center" valign="top">26</td>
<td align="center" valign="top">13</td>
<td align="center" valign="top"/>
<td align="center" valign="top">12</td>
<td align="center" valign="top">27</td>
<td align="center" valign="top"/>
<td align="center" valign="top">15</td>
<td align="right" valign="top">24</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Patients&#x02019; status</td>
<td align="right" valign="top"/>
<td align="center" valign="top"/>
<td align="right" valign="top"><bold>&lt;0.001</bold></td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.162</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top">0.217</td>
<td align="center" valign="top"/>
<td align="right" valign="top"/>
<td align="center" valign="top">0.125</td></tr>
<tr>
<td align="left" valign="top">&#x02003;Survival</td>
<td align="right" valign="top">14</td>
<td align="center" valign="top">8</td>
<td align="right" valign="top"/>
<td align="center" valign="top">14</td>
<td align="center" valign="top">8</td>
<td align="center" valign="top"/>
<td align="center" valign="top">5</td>
<td align="center" valign="top">17</td>
<td align="center" valign="top"/>
<td align="center" valign="top">5</td>
<td align="right" valign="top">17</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">&#x02003;Death</td>
<td align="right" valign="top">25</td>
<td align="center" valign="top">1</td>
<td align="right" valign="top"/>
<td align="center" valign="top">20</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top"/>
<td align="center" valign="top">8</td>
<td align="center" valign="top">18</td>
<td align="center" valign="top"/>
<td align="center" valign="top">10</td>
<td align="right" valign="top">16</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="left" valign="top">Median survival time (month)</td>
<td align="right" valign="top">30.3</td>
<td align="center" valign="top">80.4</td>
<td align="right" valign="top"><bold>&lt;0.001</bold></td>
<td align="center" valign="top">35.3</td>
<td align="center" valign="top">41.8</td>
<td align="center" valign="top"><bold>0.039</bold></td>
<td align="center" valign="top">31.8</td>
<td align="center" valign="top">41.8</td>
<td align="center" valign="top"><bold>0.038</bold></td>
<td align="center" valign="top">30.3</td>
<td align="right" valign="top">44</td>
<td align="center" valign="top"><bold>0.014</bold></td></tr></tbody></table>
<table-wrap-foot><fn id="tfn3-or-28-03-1036">
<p>Bold indicates P-values &lt;0.05.</p></fn></table-wrap-foot></table-wrap></floats-group></article>
