<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xml:lang="en" article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
<?release-delay 0|0?>
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">OL</journal-id>
<journal-title-group>
<journal-title>Oncology Letters</journal-title>
</journal-title-group>
<issn pub-type="ppub">1792-1074</issn>
<issn pub-type="epub">1792-1082</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3892/ol.2016.4604</article-id>
<article-id pub-id-type="publisher-id">OL-0-0-4604</article-id>
<article-categories>
<subj-group>
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>LI</surname><given-names>HUA</given-names></name>
<xref rid="af1-ol-0-0-4604" ref-type="aff"/></contrib>
<contrib contrib-type="author"><name><surname>LV</surname><given-names>XIN</given-names></name>
<xref rid="af1-ol-0-0-4604" ref-type="aff"/>
<xref rid="c1-ol-0-0-4604" ref-type="corresp"/></contrib>
</contrib-group>
<aff id="af1-ol-0-0-4604">Department of Anesthesiology, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200072, P.R. China</aff>
<author-notes>
<corresp id="c1-ol-0-0-4604"><italic>Correspondence to</italic>: Mr. Xin Lv, Department of Anesthesiology, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, 507 Zheng Min Road, Shanghai 200072, P.R. China, E-mail: <email>18621710790@163.com</email></corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>07</month>
<year>2016</year></pub-date>
<pub-date pub-type="epub">
<day>18</day>
<month>05</month>
<year>2016</year></pub-date>
<volume>12</volume>
<issue>1</issue>
<fpage>222</fpage>
<lpage>230</lpage>
<history>
<date date-type="received"><day>12</day><month>03</month><year>2015</year></date>
<date date-type="accepted"><day>01</day><month>04</month><year>2016</year></date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2016, Spandidos Publications</copyright-statement>
<copyright-year>2016</copyright-year>
</permissions>
<abstract>
<p>Multiple computational tools have been widely applied to the detection of coding driver mutations in cancer; however, the prioritization of pathogenic non-coding variants remains a difficult and demanding task. The present study was performed to distinguish non-coding disease-causing mutations from neutral ones, and to prioritize potential cancer-associated long non-coding RNAs (lncRNAs) with a logistic regression model in lung cancer. A logistic regression model was constructed, combining 19,153 disease-associated ClinVar and Human Gene Mutation Database pathogenic variants as the response variable and non-coding features as the predictor variable. Validation of the model was conducted with genome-wide association study (GWAS) disease- or trait-associated single nucleotide polymorphisms (SNPs) and recurrent somatic mutations. High scoring regions were characterized with respect to their distribution in various features and gene classes; potential cancer-associated lncRNA candidates were prioritized, combining the fraction of high-scoring regions and average score predicted by the logistic regression model. H3K79me2 was the most negative factor that contributed to the model, while conserved regions were most positively informative to the model. The area under the receiver operating characteristic curve of the model was 0.89. The model assigned a significantly higher score to GWAS SNPs and recurrent somatic mutations compared with neutral SNPs (mean, 5.9012 vs. 5.5238; P&#x003C;0.001, Mann-Whitney U test) and non-recurrent mutations (mean, 5.4677 vs. 5.2277, P&#x003C;0.001, Mann-Whitney U test), respectively. It was observed that regions, including splicing sites and untranslated regions, and gene classes, including cancer genes and cancer-associated lncRNAs, had an increased enrichment of high-scoring regions. In total, 2,679 cancer-associated lncRNAs were determined and characterized. A total of 104 of these lncRNAs were differentially expressed between lung cancer and normal specimens. The logistic regression model is a useful and efficient scoring system to prioritize non-coding pathogenic variants and lncRNAs, and may provide the basis for detecting non-coding driver lncRNAs in lung cancer.</p>
</abstract>
<kwd-group>
<kwd>lung cancer</kwd>
<kwd>non-coding genome</kwd>
<kwd>pathogenic variant</kwd>
<kwd>logistic regression model</kwd>
<kwd>long non-coding RNA</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Cancer is caused by the accumulation of genomic alterations and consequent disruption of biological processes (<xref rid="b1-ol-0-0-4604" ref-type="bibr">1</xref>). The rapid progression and wide application of sequencing technologies has enabled the identification of hundreds of thousands of somatic variants in cancer (<xref rid="b2-ol-0-0-4604" ref-type="bibr">2</xref>). A significant issue in cancer genomics is the distinction of driver mutations, critical to oncogenesis, from passenger ones, which have little role in cancer initiation and progression (<xref rid="b3-ol-0-0-4604" ref-type="bibr">3</xref>). The development of reliable and efficient approaches to functionally annotate variants has been a consistent research focus in cancer-associated studies, and multiple computational tools have been investigated and widely utilized for the prediction of pathogenic mutations in the coding portion of the human genome, including the &#x2018;sorting tolerant from intolerant&#x2019; algorithm (<xref rid="b4-ol-0-0-4604" ref-type="bibr">4</xref>) and the &#x2018;polymorphism phenotyping&#x2019; tool (<xref rid="b5-ol-0-0-4604" ref-type="bibr">5</xref>). As an increasing number of non-coding pathogenic variants have been detected and annotated, there exists a great demand for the development of computational tools to prioritize non-coding drivers in the cancer genome (<xref rid="b6-ol-0-0-4604" ref-type="bibr">6</xref>,<xref rid="b7-ol-0-0-4604" ref-type="bibr">7</xref>). However, there have been few studies conducted in this field.</p>
<p>The recent completion of high-throughput projects, including the Encyclopedia of DNA Elements (ENCODE) (<xref rid="b8-ol-0-0-4604" ref-type="bibr">8</xref>), 29 Mammals Project (<xref rid="b9-ol-0-0-4604" ref-type="bibr">9</xref>) and Health Roadmap Epigenomics Project (<xref rid="b10-ol-0-0-4604" ref-type="bibr">10</xref>), has made non-coding variants interpretable. In particular, the ENCODE project has provided researchers with a genome-wide map of histone modification, DNase I hypersensitive sites, formaldehyde-assisted isolation of regulatory elements, transcription factor binding sites, RNA-seq and replication timing data across a number of cell lines (<xref rid="b8-ol-0-0-4604" ref-type="bibr">8</xref>). An increasing number of studies have taken advantage of these annotations of human functional elements to investigate non-coding disease-implicated variants or drivers in cancer, including RegulomeDB (<xref rid="b11-ol-0-0-4604" ref-type="bibr">11</xref>), HaploReg (<xref rid="b12-ol-0-0-4604" ref-type="bibr">12</xref>) and Funseq (<xref rid="b13-ol-0-0-4604" ref-type="bibr">13</xref>); the scoring systems that these approaches rely on are primarily empirical scoring algorithms, which are not scientifically rigorous and stringent (<xref rid="b14-ol-0-0-4604" ref-type="bibr">14</xref>).</p>
<p>Previous studies have taken advantage of machine-learning algorithms to better predict and score the functionality of non-coding variants (<xref rid="b15-ol-0-0-4604" ref-type="bibr">15</xref>&#x2013;<xref rid="b17-ol-0-0-4604" ref-type="bibr">17</xref>). Kircher <italic>et al</italic> (<xref rid="b18-ol-0-0-4604" ref-type="bibr">18</xref>) contrasted the annotations of fixed or nearly fixed derived alleles in humans with those of simulated variants, and developed Combined Annotation-Dependent Depletion (CADD). CADD evaluates deleteriousness, which can be measured systematically across the genome assembly. Implementation of CADD as a support vector machine has successfully differentiated 14.7 million high-frequency human-derived alleles from 14.7 million simulated variants (<xref rid="b18-ol-0-0-4604" ref-type="bibr">18</xref>). Fu <italic>et al</italic> (<xref rid="b19-ol-0-0-4604" ref-type="bibr">19</xref>) developed a computational framework, FunSeq2, which processed large-scale genomics (including 1000 Genomes and ENCODE data) and cancer resources, and combined a high-throughput variant prioritization pipeline to annotate and prioritize somatic alterations, particularly regulatory non-coding mutations.</p>
<p>LncRNAs are a class of mRNA-like transcripts ranging from 200 bp to 100 kbp. They were regarded as transcription noise in the human genome, due to their lack of capability of protein translation. Over the previous decade, an increasing amount of evidence has indicated that lncRNAs have a variety of roles in numerous physiological processes (<xref rid="b19-ol-0-0-4604" ref-type="bibr">19</xref>&#x2013;<xref rid="b25-ol-0-0-4604" ref-type="bibr">25</xref>). Despite a lack of capability of encoding proteins, lncRNAs may function through regulating gene expression at various levels, including chromatin architecture, transcription, RNA splicing, and protein translation and turnover (<xref rid="b26-ol-0-0-4604" ref-type="bibr">26</xref>,<xref rid="b27-ol-0-0-4604" ref-type="bibr">27</xref>). As a consequence, deregulation of lncRNAs may have a significant role in carcinogenesis (<xref rid="b28-ol-0-0-4604" ref-type="bibr">28</xref>&#x2013;<xref rid="b31-ol-0-0-4604" ref-type="bibr">31</xref>).</p>
<p>In the present study, data concerning conservation information, regulatory features, expression and replication timing was collected, primarily from the ENCODE project, to create lung cancer-specific annotation and construct a logistic regression model based on ClinVar and HGMD pathogenic variants with the aim of functionally scoring non-coding variants in the lung cancer genome. This scoring system was applied to prioritize potential cancer-associated lncRNA candidates.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title/>
<sec>
<title>Cancer mutation and pathogenic variant data</title>
<p>A total of 1,623,250 somatic mutations detected by whole genome sequencing of 24 pairs of lung cancer and normal specimens were obtained from the supplementary data files of a previous study (<xref rid="b32-ol-0-0-4604" ref-type="bibr">32</xref>). Recurrent mutation represents two or more mutations that have the same mutation site across multiple samples (n=14,515 mutations). Non-recurrent mutation denotes mutations that only occur once in all patients. Germline polymorphism data comprising 38,248,779 single nucleotide polymorphisms (SNPs) was downloaded from the 1000 Genome project pilot 1 (<uri xlink:href="http://www.1000genomes.org">www.1000genomes.org</uri>) (<xref rid="b33-ol-0-0-4604" ref-type="bibr">33</xref>). SNPs with derived allele frequencies &#x003E;0.01 were considered to be neutral SNPs; rare SNPs denote those whose allele frequencies were &#x003C;0.01. Disease-associated variants data from ClinVar (<uri xlink:href="http://www.ncbi.nlm.nih.gov/clinvar">www.ncbi.nlm.nih.gov/clinvar</uri>) and the Human Gene Mutation Database (HGMD; <uri xlink:href="http://www.hgmd.cf.ac.uk">www.hgmd.cf.ac.uk</uri>) are known (published) gene variants responsible for human inherited diseases (<xref rid="b34-ol-0-0-4604" ref-type="bibr">34</xref>,<xref rid="b35-ol-0-0-4604" ref-type="bibr">35</xref>). Trait or disease-associated SNPs were obtained from genome-wide association studies (GWAS; <uri xlink:href="http://www.gwascentral.org">www.gwascentral.org</uri>) (<xref rid="b36-ol-0-0-4604" ref-type="bibr">36</xref>).</p>
</sec>
<sec>
<title>Genome-wide data resources</title>
<p>Human genome annotations were obtained from Gencode (<uri xlink:href="http://www.gencodegenes.org/">www.gencodegenes.org/</uri>) (<xref rid="b37-ol-0-0-4604" ref-type="bibr">37</xref>), including protein coding genes, exons, introns, untranslated regions (UTRs) and non-coding exons (<xref rid="b37-ol-0-0-4604" ref-type="bibr">37</xref>). lncRNA annotation was primarily acquired from three different sources, Gencode (<xref rid="b37-ol-0-0-4604" ref-type="bibr">37</xref>), Human Body Map large intergenic non-coding RNAs and transcripts of uncertain coding potential generated from 4 billion RNA-seq reads across 24 tissues and cell types (<xref rid="b38-ol-0-0-4604" ref-type="bibr">38</xref>) and Refseq annotation (<uri xlink:href="http://www.ncbi.nlm.nih.gov/refseq/">www.ncbi.nlm.nih.gov/refseq/</uri>) (<xref rid="b39-ol-0-0-4604" ref-type="bibr">39</xref>). In total, there were 39,952 lncRNA annotations collected from these three different databases. The 5&#x2032; splicing sites were 10 nucleotides from the 5&#x2032; end of introns of genes (<xref rid="b40-ol-0-0-4604" ref-type="bibr">40</xref>). The 3&#x2032; splicing sites were 50 nucleotides from the 3&#x2032; end of introns of genes (<xref rid="b41-ol-0-0-4604" ref-type="bibr">41</xref>). Evolutionarily conserved bases were identified using a recently published analysis of 46 mammalian genomes (<xref rid="b42-ol-0-0-4604" ref-type="bibr">42</xref>). A genome-wide phastCons score was obtained from Siepel <italic>et al&#x0027;s</italic> study (<xref rid="b16-ol-0-0-4604" ref-type="bibr">16</xref>) (<uri xlink:href="http://hgdownload.cse.ucsc.edu/goldenPath/phastConsPaper/vertebrate-scores/">hgdownload.cse.ucsc.edu/goldenPath/phastConsPaper/vertebrate-scores/</uri>). Sensitive regions from Khurana <italic>et al</italic> (<xref rid="b13-ol-0-0-4604" ref-type="bibr">13</xref>) consisted of binding sites or motifs of important transcription factors and contained an increased fraction of rare SNPs. Evolutionarily conserved structures were RNA secondary structures predicted using comparative structure prediction algorithms based on multiple genomes (<xref rid="b42-ol-0-0-4604" ref-type="bibr">42</xref>). Promoters, defined as regions 2.5 kb from transcription start sites (TSS), were generated from the Gerstein lab (<uri xlink:href="http://funseq.gersteinlab.org/data">http://funseq.gersteinlab.org/data</uri>) (<xref rid="b13-ol-0-0-4604" ref-type="bibr">13</xref>). RNA-seq data in bam format, transcription factor binding sites (TFBS), DNase I hypersensitive sites and histone modification data (H3K4me1, H3K9ac and others) of the A549 cell line were acquired from ENCODE (<xref rid="b8-ol-0-0-4604" ref-type="bibr">8</xref>). Conserved TFBS were transcription factor binding sites conserved in the human/mouse/rat alignment and obtained from University of California, Santa Cruz directly (<xref rid="b41-ol-0-0-4604" ref-type="bibr">41</xref>). The expression level was calculated by counting the number of reads per kilobase per million reads (RPKM) for each protein coding gene and lncRNA. Genes whose RPKM was &#x003E;20 or &#x003C;0.25 were defined as high and low expressed regions, respectively. A wavelet-smoothed, weighted average signal was used, and the high and low signal values corresponded with early and late replication during the S phase, respectively (<uri xlink:href="http://genome.ucsc.edu/ENCODE">genome.ucsc.edu/ENCODE</uri>, &#x2018;Repli-seq track&#x2019;) (<xref rid="b8-ol-0-0-4604" ref-type="bibr">8</xref>). Genome-wide replication timing was mapped to protein coding genes and lncRNAs. An early-to-late ratio was calculated as (G1b&#x002B;S1)/(S4&#x002B;G2) for each protein coding gene and lncRNA (<xref rid="b43-ol-0-0-4604" ref-type="bibr">43</xref>). When the ratio (G1b&#x002B;S1)/(S4&#x002B;G2) was &#x003E;1, genes were considered to be early replicated, while late replicated genes had an early-to-late ratio &#x003C;1.</p>
<p>Cancer lncRNAs containing 25 lncRNAs are a collection of mammalian long non-coding transcripts that have been experimentally demonstrated to be associated with a variety of cancer types. A list of cancer census genes was obtained from the current release of the catalogue of somatic mutations in cancer version 71 (COSMIC; <uri xlink:href="http://cancer.sanger.ac.uk/cosmic">cancer.sanger.ac.uk/cosmic</uri>) (<xref rid="b44-ol-0-0-4604" ref-type="bibr">44</xref>).</p>
</sec>
<sec>
<title>Logistic regression model training and validation</title>
<p>The disease-implicated set of variants was composed of 19,153 non-coding pathogenic variants from the ClinVar and HGMD databases. For the control sets, the present study used neutral variants whose minor allele frequency was &#x2265;1&#x0025; to reduce the possibility of including functional rare SNPs. A total of 15,789,242 potential control SNPs were included in the model. In the logistic regression model, a matrix of 425,565 rows was formed throughout the non-coding genome, and each row represented one unique combination of features. Disease-causing variants from HGMD and ClinVar databases and neutral SNPs were used as the binary response variables, and the 25 genomic features served as the predictor variables to predict the likelihood of a variant being disease-associated. The logistic regression model was constructed with the general linear model. The receiver operating characteristic (ROC) curve was generated with a R script (version 2.15.3; <uri xlink:href="http://www.r-project.org">www.r-project.org</uri>). Scores were predicted with the model for GWAS, neutral SNPs, and non-recurrent and recurrent somatic mutations of lung cancer and subsequently scaled using the following formula: scaled score=log(predicted score &#x00D7; 10<sup>6</sup>).</p>
</sec>
<sec>
<title>Prioritization of cancer-associated lncRNA candidates</title>
<p>Cancer-associated lncRNA candidates were determined with the following criteria. Firstly, the logistic regression model was used to score each nucleotide of the lncRNAs and the average score was calculated for each lncRNA. Secondly, 100 Mb non-coding regions whose scores were &#x003E;8.4149 were defined as high scoring regions, and the fraction of high scoring regions for each lncRNA was calculated. Subsequently, the final subset of lncRNA candidates was determined by identifying the overlap between the top 10&#x0025; of lncRNAs with the highest average score and the top 10&#x0025; of lncRNAs with the highest fraction of high scoring regions.</p>
</sec>
<sec>
<title>RNA-seq data processing and expression analyses of lncRNAs</title>
<p>A total of 161 RNA-seq data samples, including 76 normal lung samples and 85 cancerous samples, were obtained from the Ju <italic>et al</italic> (<xref rid="b45-ol-0-0-4604" ref-type="bibr">45</xref>) study at the European Bioinformatics Institute. Reads were mapped to the hg19 genome using the Star aligner (<uri xlink:href="https://github.com/alexdobin/STAR/releases">https://github.com/alexdobin/STAR/releases</uri>) (<xref rid="b46-ol-0-0-4604" ref-type="bibr">46</xref>). Read counts were calculated with bedtools version 2.22.1 (<uri xlink:href="http://bedtools.readthedocs.org/en/latest/#">bedtools.readthedocs.org/en/latest/#</uri>) for each lncRNA (<xref rid="b47-ol-0-0-4604" ref-type="bibr">47</xref>). The expression level in FPKM was calculated with Cufflinks version 2.2.1 (<uri xlink:href="http://cole-trapnell-lab.github.io/cufflinks/">cole-trapnell-lab.github.io/cufflinks/</uri>) (<xref rid="b48-ol-0-0-4604" ref-type="bibr">48</xref>) and log scaled for each lncRNA. DESeq2 Release version 3.0 (<uri xlink:href="http://bioconductor.org/packages/release/bioc/html/DESeq2.html">bioconductor.org/packages/release/bioc/html/DESeq2.html</uri>) (<xref rid="b49-ol-0-0-4604" ref-type="bibr">49</xref>) was used to identify differentially expressed transcripts between tumor and normal pairs, with a cutoff of false discovery rate (FDR) &#x2264;10<sup>&#x2212;4</sup> and absolute fold change &#x2265;2.</p>
</sec>
<sec>
<title>Statistical analyses</title>
<p>Data are presented as the mean &#x00B1; standard deviation. Differences between different groups were drawn with the two-sided Mann-Whitney U test or Fisher&#x0027;s exact test in R (version 2.15.3; <uri xlink:href="http://www.r-project.org">www.r-project.org</uri>). P&#x003C;0.05 was considered to indicate a statistically significant difference.</p>
</sec>
</sec>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title/>
<sec>
<title>Distinction of disease-associated non-coding variants from neutral ones with the logistic regression model</title>
<p>Estimates of the densities of ClinVar and HGMD disease-causing variants revealed that the densities of disease-associated variants varied greatly across various non-coding features (<xref rid="f1-ol-0-0-4604" ref-type="fig">Fig. 1A</xref>). Certain features, including conserved regions, conserved TFBS, UTRs, promoters and highly-expressed regions, demonstrated the highest enrichment of pathogenic variants; however, features including H3K9me3, late replicated regions, H3K27me3, evolutionarily conserved structures and H2az had low densities of disease-causing variants, suggesting that different non-coding features have importance to the functionality of non-coding variants. It was observed that conserved regions, early replicated regions, promoters, H3K36me3 and conserved TFBSs most positively contributed to the model, while H3K79me2, H3K4me2, H3K9me3, H3K9ac and low-expressed regions were the most negatively informative for the model (<xref rid="f1-ol-0-0-4604" ref-type="fig">Fig. 1B</xref>). It was demonstrated that the area under the ROC curve was 0.89 for the logistic regression model (<xref rid="f1-ol-0-0-4604" ref-type="fig">Fig. 1C</xref>), which indicated that the model was able to discriminate between disease-implicated and control variants with a high specificity and sensitivity.</p>
<p>To investigate whether the present model could be applied to prioritize candidate functional variants, the disease or trait-associated variants from GWAS were selected for an independent validation. It was observed that non-coding GWAS SNPs had a significantly higher average score compared with 1 million random, neutral SNP control variants (mean, 5.9012 vs. 5.5238; P&#x003C;0.001, two-sided Mann-Whitney U test; <xref rid="f1-ol-0-0-4604" ref-type="fig">Fig. 1D</xref>). Recurrence is considered to be a potential sign of positive selection among tumors and is more likely to be associated with driver events (<xref rid="b50-ol-0-0-4604" ref-type="bibr">50</xref>). Subsequently, the present study evaluated recurrent mutations that occurred at the exact same site across &#x003E;2 samples, as well as non-recurrent mutations, identified by whole-genome sequencing of 24 lung cancer samples. It was identified that the same-site recurrent mutations (n=14,515 mutations) had significantly higher scores compared with the non-recurrent mutations (mean, 5.4677 vs. 5.2277; P&#x003C;0.001, Mann-Whitney U test; <xref rid="f1-ol-0-0-4604" ref-type="fig">Fig. 1D</xref>), which suggested that this approach may be useful for the identification of non-coding driver mutations in lung cancer.</p>
</sec>
<sec>
<title>Definition and characterization of high-scoring regions in the non-coding genome</title>
<p>The present study defined 100 Mb non-coding regions, which were scored &#x003E;8.4149 as high-scoring regions, and analyzed fractions of high-scoring regions in a variety of feature types. The 5&#x2032; and 3&#x2032; splice sites and UTRs were among the features that contained the highest fraction of high-scoring regions; by contrast, intergenic regions, lncRNA introns and lncRNA demonstrated the lowest fraction of high-scoring regions (<xref rid="f2-ol-0-0-4604" ref-type="fig">Fig. 2A</xref>). The present model assigned a higher average score to splicing sites compared with adjacent intronic regions in protein coding genes (mean, 9.4374 vs. 8.3959; P&#x003C;0.001, Mann-Whitney U test; <xref rid="f2-ol-0-0-4604" ref-type="fig">Fig. 2B</xref>) and lncRNAs (mean, 8.1802 vs. 7.8146; P&#x003C;0.001, Mann-Whitney U test; <xref rid="f2-ol-0-0-4604" ref-type="fig">Fig. 2B</xref>). Subsequently, the present study sought gene classes with various fractions of high-scoring regions and identified that known cancer genes from COSMIC had a significantly increased fraction of high-scoring regions compared with non-cancerous ones (mean, 0.0817 vs. 0.0596; P&#x003C;0.001, Fisher&#x0027;s exact test; <xref rid="f2-ol-0-0-4604" ref-type="fig">Fig. 2C</xref>). Cancer-associated lncRNAs that were collected from recent publications demonstrated a significantly increased fraction of high-scoring regions compared with non-cancerous ones (mean, 0.1112 vs. 0.0590; P&#x003C;0.001, Fisher&#x0027;s exact test; <xref rid="f2-ol-0-0-4604" ref-type="fig">Fig. 2C</xref>), for example, HOX transcript antisense RNA (HOTAIR), metastasis associated lung adenocarcinoma transcript 1 (MALAT1), growth arrest-specific 5 (GAS5) and lung cancer associated transcript 1 are among the top 10&#x0025; of lncRNAs with respect to high-scoring coverage (<xref rid="f2-ol-0-0-4604" ref-type="fig">Fig. 2D</xref>).</p>
</sec>
<sec>
<title>Prioritization of lung cancer-associated lncRNAs with the scoring system</title>
<p>Regarding prioritization of lung cancer-implicated lncRNAs, the fraction of high-scoring regions and average score were calculated for each lncRNA. Subsequently, overlapping lncRNAs were determined between the top 10&#x0025; of lncRNAs with the highest fraction of high scoring regions and the top 10&#x0025; of lncRNAs with the highest average score. A total of 2,679 lncRNAs were filtered out as functional candidates, including some experimentally characterized cancer-associated lncRNAs, including MALAT1, HOTAIR and GAS5. In the present study it was demonstrated that this subset of lncRNA candidates had a significantly increased fraction of conserved regions (mean, 0.1741 vs. 0.0528; P&#x003C;0.001, Mann-Whitney U test; <xref rid="f3-ol-0-0-4604" ref-type="fig">Fig. 3A</xref>) and average phastCons score (mean, 0.2770 vs. 0.2602; P&#x003C;0.001, Mann-Whitney U test; <xref rid="f3-ol-0-0-4604" ref-type="fig">Fig. 3B</xref>) compared with control lncRNAs, indicating that they were more conserved relative to control lncRNAs. It was additionally observed that this subset of lncRNAs had an increased enrichment of disease or trait-associated GWAS SNPs (mean, 6.2106 vs. 4.0618 SNPs/Mb; P&#x003C;0.001, Fisher&#x0027;s exact test; <xref rid="f3-ol-0-0-4604" ref-type="fig">Fig. 3C</xref>) and a lower somatic mutation density compared with the control lncRNAs (mean, 329.8380 vs. 573.2742 mutations/Mb; P&#x003C;0.001, Fisher&#x0027;s exact test; <xref rid="f3-ol-0-0-4604" ref-type="fig">Fig. 3D</xref>). RNA-seq data of 76 normal lung samples and 85 cancer samples were obtained from Ju <italic>et al&#x0027;s</italic> (<xref rid="b45-ol-0-0-4604" ref-type="bibr">45</xref>) study, which is publicly available from the European Bioinformatics Institute. Read alignment was conducted with a Star aligner and coverage was calculated for each lncRNA with bedtools software. DESeq2 was used to investigate the differential expression of lncRNAs between lung cancer and normal samples. It was observed that the lncRNA candidates showed significantly increased expression compared with control lncRNAs in cancerous and normal samples (log scaled FPKM, 1.8924 vs. 1.1386; P&#x003C;2.2e-16, Mann-Whitney U test; <xref rid="f3-ol-0-0-4604" ref-type="fig">Fig. 3E</xref>). Differentially expressed lncRNAs were determined based on the criteria that lncRNAs have cutoff FDR &#x003C;10<sup>&#x2212;4</sup> and absolute fold change &#x003E;2. The number of differentially expressed lncRNAs was 2,208, and 104 of them were among the list of potentially cancer-associated lncRNAs (<xref rid="f4-ol-0-0-4604" ref-type="fig">Fig. 4</xref>).</p>
</sec>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>In the present study, a logistic regression model was presented and used to predict &#x2018;high-impact&#x2019; somatic alterations, combining pathogenic variants from ClinVar and HGMD databases and lung-cancer specific features. There are two main advantages of the present scoring model: Firstly, the logistic regression model took into account all non-coding pathogenic variants from HGMD and ClinVar databases, which are two well-known databases of disease-associated variants worldwide, allowing for a complete assessment of the damaging impact of any non-coding variant in the human genome. Furthermore, a large number of features used in the annotation are lung-cancer specific, including histone modifications, TFBSs, replication timing and expression data, which facilitates the scoring of variants in a lung cancer-specific manner.</p>
<p>Non-coding features that most positively contributed to the model include conserved regions, early replicated regions, promoter, H3K36me3, H3K4me3, conserved TFBS, TFBS and sensitive regions. Among these features, H3K36me3 is associated with actively transcribed genes, and H3K4me3 is a hallmark of actively transcribed protein-coding promoters in eukaryotes (<xref rid="b51-ol-0-0-4604" ref-type="bibr">51</xref>). These findings support the fact that conserved and regulatory elements are critical to the formation and functionality of pathogenic variants in the non-coding genome (<xref rid="b52-ol-0-0-4604" ref-type="bibr">52</xref>). The area under the ROC curve was 0.89, which outperformed two well-known tools CADD and funSeq2 (<xref rid="b14-ol-0-0-4604" ref-type="bibr">14</xref>), however, more stringent comparison must be conducted to obtain a final conclusion. Furthermore, the present model successfully distinguished GWAS variants and recurrent cancer mutations from benign SNPs and non-recurrent mutations, demonstrating the reliability and efficient performance of the model.</p>
<p>Given that splicing sites and UTRs are more evolutionarily conserved across mammals (<xref rid="b53-ol-0-0-4604" ref-type="bibr">53</xref>), it was observed that these regions have a higher fraction of high-scoring regions and splicing sites have a higher score compared with intronic regions. With respect to the distribution of high-scoring regions in various gene classes, it was observed that known cancer genes and cancer-associated lncRNAs demonstrated increased enrichment of high-scoring regions compared with non-cancerous genes. Based on these findings, the present study combined the fraction of high-scoring regions and average score of each lncRNA to filter out a subset of functional lncRNA candidates, which contained a number of well-characterized cancer lncRNAs, for example, HOTAIR, the expression of which is elevated in lung cancer and correlated with metastasis and poor prognosis (<xref rid="b54-ol-0-0-4604" ref-type="bibr">54</xref>). MALAT1 has been implicated in tumorigenesis and progression in a variety of cancer types (<xref rid="b55-ol-0-0-4604" ref-type="bibr">55</xref>&#x2013;<xref rid="b57-ol-0-0-4604" ref-type="bibr">57</xref>). A total of 104 functional lncRNA candidates were are differentially expressed in lung cancer and normal samples. This group of lncRNAs are important candidates for cancer researchers to conduct additional experimental validation and characterization in future studies.</p>
<p>In conclusion, the present scoring system provides an opportunity to identify cancer-driving mutations in the vast non-coding human genome, as well as prioritizes a number of lncRNA candidates for cancer research. This scoring system may assist with the identification of driver non-coding genes for improved clinical decision-making in the future.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>The present research was made possible with financial support from the National Natural Sciences Foundation of China (Beijing, China; grant no., 81272142).</p>
</ack>
<ref-list>
<title>References</title>
<ref id="b1-ol-0-0-4604"><label>1</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Roschke</surname><given-names>AV</given-names></name><name><surname>Rozenblum</surname><given-names>E</given-names></name></person-group><article-title>Multi-layered cancer chromosomal instability phenotype</article-title><source>Front Oncol</source><volume>3</volume><fpage>1</fpage><lpage>13</lpage><year>2013</year><pub-id pub-id-type="doi">10.3389/fonc.2013.00302</pub-id><pub-id pub-id-type="pmid">23373009</pub-id></element-citation></ref>
<ref id="b2-ol-0-0-4604"><label>2</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Robison</surname><given-names>K</given-names></name></person-group><article-title>Application of second-generation sequencing to cancer genomics</article-title><source>Brief Bioinform</source><volume>11</volume><fpage>524</fpage><lpage>534</lpage><year>2010</year><pub-id pub-id-type="doi">10.1093/bib/bbq013</pub-id><pub-id pub-id-type="pmid">20427421</pub-id></element-citation></ref>
<ref id="b3-ol-0-0-4604"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Greenman</surname><given-names>C</given-names></name><name><surname>Stephens</surname><given-names>P</given-names></name><name><surname>Smith</surname><given-names>R</given-names></name><name><surname>Dalgliesh</surname><given-names>GL</given-names></name><name><surname>Hunter</surname><given-names>C</given-names></name><name><surname>Bignell</surname><given-names>G</given-names></name><name><surname>Davies</surname><given-names>H</given-names></name><name><surname>Teague</surname><given-names>J</given-names></name><name><surname>Butler</surname><given-names>A</given-names></name><name><surname>Stevens</surname><given-names>C</given-names></name><etal/></person-group><article-title>Patterns of somatic mutation in human cancer genomes</article-title><source>Nature</source><volume>446</volume><fpage>153</fpage><lpage>158</lpage><year>2007</year><pub-id pub-id-type="doi">10.1038/nature05610</pub-id><pub-id pub-id-type="pmid">17344846</pub-id></element-citation></ref>
<ref id="b4-ol-0-0-4604"><label>4</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sim</surname><given-names>NL</given-names></name><name><surname>Kumar</surname><given-names>P</given-names></name><name><surname>Hu</surname><given-names>J</given-names></name><name><surname>Henikoff</surname><given-names>S</given-names></name><name><surname>Schneider</surname><given-names>G</given-names></name><name><surname>Ng</surname><given-names>PC</given-names></name></person-group><article-title>SIFT web server: Predicting effects of amino acid substitutions on proteins</article-title><source>Nucleic Acids Res</source><volume>40</volume><fpage>W452</fpage><lpage>W457</lpage><year>2012</year><pub-id pub-id-type="doi">10.1093/nar/gks539</pub-id><pub-id pub-id-type="pmid">22689647</pub-id></element-citation></ref>
<ref id="b5-ol-0-0-4604"><label>5</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Adzhubei</surname><given-names>IA</given-names></name><name><surname>Schmidt</surname><given-names>S</given-names></name><name><surname>Peshkin</surname><given-names>L</given-names></name><name><surname>Ramensky</surname><given-names>VE</given-names></name><name><surname>Gerasimova</surname><given-names>A</given-names></name><name><surname>Bork</surname><given-names>P</given-names></name><name><surname>Kondrashov</surname><given-names>AS</given-names></name><name><surname>Sunyaev</surname><given-names>SR</given-names></name></person-group><article-title>A method and server for predicting damaging missense mutations</article-title><source>Nat Methods</source><volume>7</volume><fpage>248</fpage><lpage>249</lpage><year>2010</year><pub-id pub-id-type="doi">10.1038/nmeth0410-248</pub-id><pub-id pub-id-type="pmid">20354512</pub-id></element-citation></ref>
<ref id="b6-ol-0-0-4604"><label>6</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>FW</given-names></name><name><surname>Hodis</surname><given-names>E</given-names></name><name><surname>Xu</surname><given-names>MJ</given-names></name><name><surname>Kryukov</surname><given-names>GV</given-names></name><name><surname>Chin</surname><given-names>L</given-names></name><name><surname>Garraway</surname><given-names>LA</given-names></name></person-group><article-title>Highly recurrent TERT promoter mutations in human melanoma</article-title><source>Science</source><volume>339</volume><fpage>957</fpage><lpage>959</lpage><year>2013</year><pub-id pub-id-type="doi">10.1126/science.1229259</pub-id><pub-id pub-id-type="pmid">23348506</pub-id></element-citation></ref>
<ref id="b7-ol-0-0-4604"><label>7</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Horn</surname><given-names>S</given-names></name><name><surname>Figl</surname><given-names>A</given-names></name><name><surname>Rachakonda</surname><given-names>PS</given-names></name><name><surname>Fischer</surname><given-names>C</given-names></name><name><surname>Sucker</surname><given-names>A</given-names></name><name><surname>Gast</surname><given-names>A</given-names></name><name><surname>Kadel</surname><given-names>S</given-names></name><name><surname>Moll</surname><given-names>I</given-names></name><name><surname>Nagore</surname><given-names>E</given-names></name><name><surname>Hemminki</surname><given-names>K</given-names></name><etal/></person-group><article-title>TERT promoter mutations in familial and sporadic melanoma</article-title><source>Science</source><volume>339</volume><fpage>959</fpage><lpage>961</lpage><year>2013</year><pub-id pub-id-type="doi">10.1126/science.1230062</pub-id><pub-id pub-id-type="pmid">23348503</pub-id></element-citation></ref>
<ref id="b8-ol-0-0-4604"><label>8</label><element-citation publication-type="journal"><article-title>ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome</article-title><source>Nature</source><volume>489</volume><fpage>57</fpage><lpage>74</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/nature11247</pub-id><pub-id pub-id-type="pmid">22955616</pub-id></element-citation></ref>
<ref id="b9-ol-0-0-4604"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lowe</surname><given-names>CB</given-names></name><name><surname>Haussler</surname><given-names>D</given-names></name></person-group><article-title>29 mammalian genomes reveal novel exaptations of mobile elements for likely regulatory functions in the human genome</article-title><source>PLoS One</source><volume>7</volume><fpage>e43128</fpage><year>2012</year><pub-id pub-id-type="doi">10.1371/journal.pone.0043128</pub-id><pub-id pub-id-type="pmid">22952639</pub-id></element-citation></ref>
<ref id="b10-ol-0-0-4604"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bernstein</surname><given-names>BE</given-names></name><name><surname>Stamatoyannopoulos</surname><given-names>JA</given-names></name><name><surname>Costello</surname><given-names>JF</given-names></name><name><surname>Ren</surname><given-names>B</given-names></name><name><surname>Milosavljevic</surname><given-names>A</given-names></name><name><surname>Meissner</surname><given-names>A</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name><name><surname>Marra</surname><given-names>MA</given-names></name><name><surname>Beaudet</surname><given-names>AL</given-names></name><name><surname>Ecker</surname><given-names>JR</given-names></name><etal/></person-group><article-title>The NIH roadmap epigenomics mapping consortium</article-title><source>Nat Biotechnol</source><volume>28</volume><fpage>1045</fpage><lpage>1048</lpage><year>2010</year><pub-id pub-id-type="doi">10.1038/nbt1010-1045</pub-id><pub-id pub-id-type="pmid">20944595</pub-id></element-citation></ref>
<ref id="b11-ol-0-0-4604"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Boyle</surname><given-names>AP</given-names></name><name><surname>Hong</surname><given-names>EL</given-names></name><name><surname>Hariharan</surname><given-names>M</given-names></name><name><surname>Cheng</surname><given-names>Y</given-names></name><name><surname>Schaub</surname><given-names>MA</given-names></name><name><surname>Kasowski</surname><given-names>M</given-names></name><name><surname>Karczewski</surname><given-names>KJ</given-names></name><name><surname>Park</surname><given-names>J</given-names></name><name><surname>Hitz</surname><given-names>BC</given-names></name><name><surname>Weng</surname><given-names>S</given-names></name><etal/></person-group><article-title>Annotation of functional variation in personal genomes using RegulomeDB</article-title><source>Genome Res</source><volume>22</volume><fpage>1790</fpage><lpage>1797</lpage><year>2012</year><pub-id pub-id-type="doi">10.1101/gr.137323.112</pub-id><pub-id pub-id-type="pmid">22955989</pub-id></element-citation></ref>
<ref id="b12-ol-0-0-4604"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ward</surname><given-names>LD</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name></person-group><article-title>HaploReg: A resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants</article-title><source>Nucleic Acids Res</source><volume>40</volume><fpage>D930</fpage><lpage>D934</lpage><year>2012</year><pub-id pub-id-type="doi">10.1093/nar/gkr917</pub-id><pub-id pub-id-type="pmid">22064851</pub-id></element-citation></ref>
<ref id="b13-ol-0-0-4604"><label>13</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Khurana</surname><given-names>E</given-names></name><name><surname>Fu</surname><given-names>Y</given-names></name><name><surname>Colonna</surname><given-names>V</given-names></name><name><surname>Mu</surname><given-names>XJ</given-names></name><name><surname>Kang</surname><given-names>HM</given-names></name><name><surname>Lappalainen</surname><given-names>T</given-names></name><name><surname>Sboner</surname><given-names>A</given-names></name><name><surname>Lochovsky</surname><given-names>L</given-names></name><name><surname>Chen</surname><given-names>J</given-names></name><name><surname>Harmanci</surname><given-names>A</given-names></name><etal/></person-group><article-title>Integrative annotation of variants from 1092 humans: Application to cancer genomics</article-title><source>Science</source><volume>342</volume><fpage>1235587</fpage><year>2013</year><pub-id pub-id-type="doi">10.1126/science.1235587</pub-id><pub-id pub-id-type="pmid">24092746</pub-id></element-citation></ref>
<ref id="b14-ol-0-0-4604"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Li</surname><given-names>J</given-names></name><name><surname>Drubay</surname><given-names>D</given-names></name><name><surname>Michiels</surname><given-names>S</given-names></name><name><surname>Gautheret</surname><given-names>D</given-names></name></person-group><article-title>Mining the coding and non-coding genome for cancer drivers</article-title><source>Cancer Lett</source><volume>369</volume><fpage>307</fpage><lpage>315</lpage><year>2015</year><pub-id pub-id-type="doi">10.1016/j.canlet.2015.09.015</pub-id><pub-id pub-id-type="pmid">26433158</pub-id></element-citation></ref>
<ref id="b15-ol-0-0-4604"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cooper</surname><given-names>GM</given-names></name><name><surname>Stone</surname><given-names>EA</given-names></name><name><surname>Asimenos</surname><given-names>G</given-names></name></person-group><article-title>NISC Comparative Sequencing Program, Green ED, Batzoglou S and Sidow A: Distribution and intensity of constraint in mammalian genomic sequence</article-title><source>Genome Res</source><volume>15</volume><fpage>901</fpage><lpage>913</lpage><year>2005</year><pub-id pub-id-type="doi">10.1101/gr.3577405</pub-id><pub-id pub-id-type="pmid">15965027</pub-id></element-citation></ref>
<ref id="b16-ol-0-0-4604"><label>16</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Siepel</surname><given-names>A</given-names></name><name><surname>Bejerano</surname><given-names>G</given-names></name><name><surname>Pedersen</surname><given-names>JS</given-names></name><name><surname>Hinrichs</surname><given-names>AS</given-names></name><name><surname>Hou</surname><given-names>M</given-names></name><name><surname>Rosenbloom</surname><given-names>K</given-names></name><name><surname>Clawson</surname><given-names>H</given-names></name><name><surname>Spieth</surname><given-names>J</given-names></name><name><surname>Hillier</surname><given-names>LW</given-names></name><name><surname>Richards</surname><given-names>S</given-names></name><etal/></person-group><article-title>Evolutionarily conserved elements in vertebrate, insect, worm and yeast genomes</article-title><source>Genome Res</source><volume>15</volume><fpage>1034</fpage><lpage>1050</lpage><year>2005</year><pub-id pub-id-type="doi">10.1101/gr.3715005</pub-id><pub-id pub-id-type="pmid">16024819</pub-id></element-citation></ref>
<ref id="b17-ol-0-0-4604"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pollard</surname><given-names>KS</given-names></name><name><surname>Hubisz</surname><given-names>MJ</given-names></name><name><surname>Rosenbloom</surname><given-names>KR</given-names></name><name><surname>Siepel</surname><given-names>A</given-names></name></person-group><article-title>Detection of nonneutral substitution rates on mammalian phylogenies</article-title><source>Genome Res</source><volume>20</volume><fpage>110</fpage><lpage>121</lpage><year>2010</year><pub-id pub-id-type="doi">10.1101/gr.097857.109</pub-id><pub-id pub-id-type="pmid">19858363</pub-id></element-citation></ref>
<ref id="b18-ol-0-0-4604"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kircher</surname><given-names>M</given-names></name><name><surname>Witten</surname><given-names>DM</given-names></name><name><surname>Jain</surname><given-names>P</given-names></name><name><surname>O&#x0027;Roak</surname><given-names>BJ</given-names></name><name><surname>Cooper</surname><given-names>GM</given-names></name><name><surname>Shendure</surname><given-names>J</given-names></name></person-group><article-title>A general framework for estimating the relative pathogenicity of human genetic variants</article-title><source>Nat Genet</source><volume>46</volume><fpage>310</fpage><lpage>315</lpage><year>2014</year><pub-id pub-id-type="doi">10.1038/ng.2892</pub-id><pub-id pub-id-type="pmid">24487276</pub-id></element-citation></ref>
<ref id="b19-ol-0-0-4604"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fu</surname><given-names>Y</given-names></name><name><surname>Liu</surname><given-names>Z</given-names></name><name><surname>Lou</surname><given-names>S</given-names></name><name><surname>Bedford</surname><given-names>J</given-names></name><name><surname>Mu</surname><given-names>XJ</given-names></name><name><surname>Yip</surname><given-names>KY</given-names></name><name><surname>Khurana</surname><given-names>E</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name></person-group><article-title>FunSeq2: A framework for prioritizing noncoding regulatory variants in cancer</article-title><source>Genome Biol</source><volume>15</volume><fpage>480</fpage><year>2014</year><pub-id pub-id-type="doi">10.1186/s13059-014-0480-5</pub-id><pub-id pub-id-type="pmid">25273974</pub-id></element-citation></ref>
<ref id="b20-ol-0-0-4604"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jeon</surname><given-names>Y</given-names></name><name><surname>Sarma</surname><given-names>K</given-names></name><name><surname>Lee</surname><given-names>JT</given-names></name></person-group><article-title>New and Xisting regulatory mechanisms of X chromosome inactivation</article-title><source>Curr Opin Genet Dev</source><volume>22</volume><fpage>62</fpage><lpage>71</lpage><year>2012</year><pub-id pub-id-type="doi">10.1016/j.gde.2012.02.007</pub-id><pub-id pub-id-type="pmid">22424802</pub-id></element-citation></ref>
<ref id="b21-ol-0-0-4604"><label>21</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mattick</surname><given-names>JS</given-names></name><name><surname>Amaral</surname><given-names>PP</given-names></name><name><surname>Dinger</surname><given-names>ME</given-names></name><name><surname>Mercer</surname><given-names>TR</given-names></name><name><surname>Mehler</surname><given-names>MF</given-names></name></person-group><article-title>RNA regulation of epigenetic processes</article-title><source>Bioessays</source><volume>31</volume><fpage>51</fpage><lpage>59</lpage><year>2009</year><pub-id pub-id-type="doi">10.1002/bies.080099</pub-id><pub-id pub-id-type="pmid">19154003</pub-id></element-citation></ref>
<ref id="b22-ol-0-0-4604"><label>22</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wapinski</surname><given-names>O</given-names></name><name><surname>Chang</surname><given-names>HY</given-names></name></person-group><article-title>Long noncoding RNAs and human disease</article-title><source>Trends Cell Biol</source><volume>21</volume><fpage>354</fpage><lpage>361</lpage><year>2011</year><pub-id pub-id-type="doi">10.1016/j.tcb.2011.04.001</pub-id><pub-id pub-id-type="pmid">21550244</pub-id></element-citation></ref>
<ref id="b23-ol-0-0-4604"><label>23</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>&#x00D8;rom</surname><given-names>UA</given-names></name><name><surname>Derrien</surname><given-names>T</given-names></name><name><surname>Beringer</surname><given-names>M</given-names></name><name><surname>Gumireddy</surname><given-names>K</given-names></name><name><surname>Gardini</surname><given-names>A</given-names></name><name><surname>Bussotti</surname><given-names>G</given-names></name><name><surname>Lai</surname><given-names>F</given-names></name><name><surname>Zytnicki</surname><given-names>M</given-names></name><name><surname>Notredame</surname><given-names>C</given-names></name><name><surname>Huang</surname><given-names>Q</given-names></name><etal/></person-group><article-title>Long noncoding RNAs with enhancer-like function in human cells</article-title><source>Cell</source><volume>143</volume><fpage>46</fpage><lpage>58</lpage><year>2010</year><pub-id pub-id-type="doi">10.1016/j.cell.2010.09.001</pub-id><pub-id pub-id-type="pmid">20887892</pub-id></element-citation></ref>
<ref id="b24-ol-0-0-4604"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Clark</surname><given-names>MB</given-names></name><name><surname>Mattick</surname><given-names>JS</given-names></name></person-group><article-title>Long noncoding RNAs in cell biology</article-title><source>Semin Cell Dev Biol</source><volume>22</volume><fpage>366</fpage><lpage>376</lpage><year>2011</year><pub-id pub-id-type="doi">10.1016/j.semcdb.2011.01.001</pub-id><pub-id pub-id-type="pmid">21256239</pub-id></element-citation></ref>
<ref id="b25-ol-0-0-4604"><label>25</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rando</surname><given-names>TA</given-names></name><name><surname>Chang</surname><given-names>HY</given-names></name></person-group><article-title>Aging, rejuvenation, and epigenetic reprogramming: Resetting the aging clock</article-title><source>Cell</source><volume>148</volume><fpage>46</fpage><lpage>57</lpage><year>2012</year><pub-id pub-id-type="doi">10.1016/j.cell.2012.01.003</pub-id><pub-id pub-id-type="pmid">22265401</pub-id></element-citation></ref>
<ref id="b26-ol-0-0-4604"><label>26</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nie</surname><given-names>L</given-names></name><name><surname>Wu</surname><given-names>HJ</given-names></name><name><surname>Hsu</surname><given-names>JM</given-names></name><name><surname>Chang</surname><given-names>SS</given-names></name><name><surname>Labaff</surname><given-names>AM</given-names></name><name><surname>Li</surname><given-names>CW</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name><name><surname>Hsu</surname><given-names>JL</given-names></name><name><surname>Hung</surname><given-names>MC</given-names></name></person-group><article-title>Long non-coding RNAs: Versatile master regulators of gene expression and crucial players in cancer</article-title><source>Am J Transl Res</source><volume>4</volume><fpage>127</fpage><lpage>150</lpage><year>2012</year><pub-id pub-id-type="pmid">22611467</pub-id></element-citation></ref>
<ref id="b27-ol-0-0-4604"><label>27</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gutschner</surname><given-names>T</given-names></name><name><surname>Diederichs</surname><given-names>S</given-names></name></person-group><article-title>The hallmarks of cancer: A long non-coding RNA point of view</article-title><source>RNA Biol</source><volume>9</volume><fpage>703</fpage><lpage>719</lpage><year>2012</year><pub-id pub-id-type="doi">10.4161/rna.20481</pub-id><pub-id pub-id-type="pmid">22664915</pub-id></element-citation></ref>
<ref id="b28-ol-0-0-4604"><label>28</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fang</surname><given-names>Z</given-names></name><name><surname>Wu</surname><given-names>L</given-names></name><name><surname>Wang</surname><given-names>L</given-names></name><name><surname>Yang</surname><given-names>Y</given-names></name><name><surname>Meng</surname><given-names>Y</given-names></name><name><surname>Yang</surname><given-names>H</given-names></name></person-group><article-title>Increased expression of the long non-coding RNA UCA1 in tongue squamous cell carcinomas: A possible correlation with cancer metastasis</article-title><source>Oral Surg Oral Med Oral Pathol Oral Radiol</source><volume>117</volume><fpage>89</fpage><lpage>95</lpage><year>2014</year><pub-id pub-id-type="doi">10.1016/j.oooo.2013.09.007</pub-id><pub-id pub-id-type="pmid">24332332</pub-id></element-citation></ref>
<ref id="b29-ol-0-0-4604"><label>29</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gupta</surname><given-names>RA</given-names></name><name><surname>Shah</surname><given-names>N</given-names></name><name><surname>Wang</surname><given-names>KC</given-names></name><name><surname>Kim</surname><given-names>J</given-names></name><name><surname>Horlings</surname><given-names>HM</given-names></name><name><surname>Wong</surname><given-names>DJ</given-names></name><name><surname>Tsai</surname><given-names>MC</given-names></name><name><surname>Hung</surname><given-names>T</given-names></name><name><surname>Argani</surname><given-names>P</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><etal/></person-group><article-title>Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis</article-title><source>Nature</source><volume>464</volume><fpage>1071</fpage><lpage>1076</lpage><year>2010</year><pub-id pub-id-type="doi">10.1038/nature08975</pub-id><pub-id pub-id-type="pmid">20393566</pub-id></element-citation></ref>
<ref id="b30-ol-0-0-4604"><label>30</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guffanti</surname><given-names>A</given-names></name><name><surname>Iacono</surname><given-names>M</given-names></name><name><surname>Pelucchi</surname><given-names>P</given-names></name><name><surname>Kim</surname><given-names>N</given-names></name><name><surname>Sold&#x00E0;</surname><given-names>G</given-names></name><name><surname>Croft</surname><given-names>LJ</given-names></name><name><surname>Taft</surname><given-names>RJ</given-names></name><name><surname>Rizzi</surname><given-names>E</given-names></name><name><surname>Askarian-Amiri</surname><given-names>M</given-names></name><name><surname>Bonnal</surname><given-names>RJ</given-names></name><etal/></person-group><article-title>A transcriptional sketch of a primary human breast cancer by 454 deep sequencing</article-title><source>BMC Genomics</source><volume>10</volume><fpage>163</fpage><year>2009</year><pub-id pub-id-type="doi">10.1186/1471-2164-10-163</pub-id><pub-id pub-id-type="pmid">19379481</pub-id></element-citation></ref>
<ref id="b31-ol-0-0-4604"><label>31</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Garding</surname><given-names>A</given-names></name><name><surname>Bhattacharya</surname><given-names>N</given-names></name><name><surname>Claus</surname><given-names>R</given-names></name><name><surname>Ruppel</surname><given-names>M</given-names></name><name><surname>Tschuch</surname><given-names>C</given-names></name><name><surname>Filarsky</surname><given-names>K</given-names></name><name><surname>Idler</surname><given-names>I</given-names></name><name><surname>Zucknick</surname><given-names>M</given-names></name><name><surname>Caudron-Herger</surname><given-names>M</given-names></name><name><surname>Oakes</surname><given-names>C</given-names></name><etal/></person-group><article-title>Epigenetic upregulation of lncRNAs at 13q14.3 in leukemia is linked to the ln Cis downregulation of a gene cluster that targets NF-kB</article-title><source>PLoS Genet</source><volume>9</volume><fpage>e1003373</fpage><year>2013</year><pub-id pub-id-type="doi">10.1371/journal.pgen.1003373</pub-id><pub-id pub-id-type="pmid">23593011</pub-id></element-citation></ref>
<ref id="b32-ol-0-0-4604"><label>32</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alexandrov</surname><given-names>LB</given-names></name><name><surname>Nik-Zainal</surname><given-names>S</given-names></name><name><surname>Wedge</surname><given-names>DC</given-names></name><name><surname>Aparicio</surname><given-names>SA</given-names></name><name><surname>Behjati</surname><given-names>S</given-names></name><name><surname>Biankin</surname><given-names>AV</given-names></name><name><surname>Bignell</surname><given-names>GR</given-names></name><name><surname>Bolli</surname><given-names>N</given-names></name><name><surname>Borg</surname><given-names>A</given-names></name><name><surname>B&#x00F8;rresen-Dale</surname><given-names>AL</given-names></name><etal/></person-group><article-title>Signatures of mutational processes in human cancer</article-title><source>Nature</source><volume>500</volume><fpage>415</fpage><lpage>421</lpage><year>2013</year><pub-id pub-id-type="doi">10.1038/nature12477</pub-id><pub-id pub-id-type="pmid">23945592</pub-id></element-citation></ref>
<ref id="b33-ol-0-0-4604"><label>33</label><element-citation publication-type="journal"><comment>1000 Genomes Project Consortium</comment><person-group person-group-type="author"><name><surname>Abecasis</surname><given-names>GR</given-names></name><name><surname>Auton</surname><given-names>A</given-names></name><name><surname>Brooks</surname><given-names>LD</given-names></name><name><surname>DePristo</surname><given-names>MA</given-names></name><name><surname>Durbin</surname><given-names>RM</given-names></name><name><surname>Handsaker</surname><given-names>RE</given-names></name><name><surname>Kang</surname><given-names>HM</given-names></name><name><surname>Marth</surname><given-names>GT</given-names></name><name><surname>McVean</surname><given-names>GA</given-names></name></person-group><article-title>An integrated map of genetic variation from 1,092 human genomes</article-title><source>Nature</source><volume>491</volume><fpage>56</fpage><lpage>65</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/nature11632</pub-id><pub-id pub-id-type="pmid">23128226</pub-id></element-citation></ref>
<ref id="b34-ol-0-0-4604"><label>34</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Landrum</surname><given-names>MJ</given-names></name><name><surname>Lee</surname><given-names>JM</given-names></name><name><surname>Riley</surname><given-names>GR</given-names></name><name><surname>Jang</surname><given-names>W</given-names></name><name><surname>Rubinstein</surname><given-names>WS</given-names></name><name><surname>Church</surname><given-names>DM</given-names></name><name><surname>Maglott</surname><given-names>DR</given-names></name></person-group><article-title>ClinVar: Public archive of relationships among sequence variation and human phenotype</article-title><source>Nucleic Acids Res</source><volume>42</volume><fpage>D980</fpage><lpage>D985</lpage><year>2014</year><pub-id pub-id-type="doi">10.1093/nar/gkt1113</pub-id><pub-id pub-id-type="pmid">24234437</pub-id></element-citation></ref>
<ref id="b35-ol-0-0-4604"><label>35</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Stenson</surname><given-names>PD</given-names></name><name><surname>Mort</surname><given-names>M</given-names></name><name><surname>Ball</surname><given-names>EV</given-names></name><name><surname>Howells</surname><given-names>K</given-names></name><name><surname>Phillips</surname><given-names>AD</given-names></name><name><surname>Thomas</surname><given-names>NS</given-names></name><name><surname>Cooper</surname><given-names>DN</given-names></name></person-group><article-title>The human gene mutation database: 2008 update</article-title><source>Genome Med</source><volume>1</volume><fpage>13</fpage><year>2009</year><pub-id pub-id-type="doi">10.1186/gm13</pub-id><pub-id pub-id-type="pmid">19348700</pub-id></element-citation></ref>
<ref id="b36-ol-0-0-4604"><label>36</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Beck</surname><given-names>T</given-names></name><name><surname>Hastings</surname><given-names>RK</given-names></name><name><surname>Gollapudi</surname><given-names>S</given-names></name><name><surname>Free</surname><given-names>RC</given-names></name><name><surname>Brookes</surname><given-names>AJ</given-names></name></person-group><article-title>GWAS central: A comprehensive resource for the comparison and interrogation of genome-wide association studies</article-title><source>Eur J Hum Genet</source><volume>22</volume><fpage>949</fpage><lpage>952</lpage><year>2014</year><pub-id pub-id-type="doi">10.1038/ejhg.2013.274</pub-id><pub-id pub-id-type="pmid">24301061</pub-id></element-citation></ref>
<ref id="b37-ol-0-0-4604"><label>37</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Frankish</surname><given-names>A</given-names></name><name><surname>Gonzalez</surname><given-names>JM</given-names></name><name><surname>Tapanari</surname><given-names>E</given-names></name><name><surname>Diekhans</surname><given-names>M</given-names></name><name><surname>Kokocinski</surname><given-names>F</given-names></name><name><surname>Aken</surname><given-names>BL</given-names></name><name><surname>Barrell</surname><given-names>D</given-names></name><name><surname>Zadissa</surname><given-names>A</given-names></name><name><surname>Searle</surname><given-names>S</given-names></name><etal/></person-group><article-title>GENCODE: The reference human genome annotation for the ENCODE project</article-title><source>Genome Res</source><volume>22</volume><fpage>1760</fpage><lpage>1774</lpage><year>2012</year><pub-id pub-id-type="doi">10.1101/gr.135350.111</pub-id><pub-id pub-id-type="pmid">22955987</pub-id></element-citation></ref>
<ref id="b38-ol-0-0-4604"><label>38</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cabili</surname><given-names>MN</given-names></name><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Goff</surname><given-names>L</given-names></name><name><surname>Koziol</surname><given-names>M</given-names></name><name><surname>Tazon-Vega</surname><given-names>B</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name></person-group><article-title>Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses</article-title><source>Genes Dev</source><volume>25</volume><fpage>1915</fpage><lpage>1927</lpage><year>2011</year><pub-id pub-id-type="doi">10.1101/gad.17446611</pub-id><pub-id pub-id-type="pmid">21890647</pub-id></element-citation></ref>
<ref id="b39-ol-0-0-4604"><label>39</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pruitt</surname><given-names>KD</given-names></name><name><surname>Tatusova</surname><given-names>T</given-names></name><name><surname>Maglott</surname><given-names>DR</given-names></name></person-group><article-title>NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins</article-title><source>Nucleic Acids Res</source><volume>35</volume><fpage>D61</fpage><lpage>D65</lpage><year>2007</year><pub-id pub-id-type="doi">10.1093/nar/gkl842</pub-id><pub-id pub-id-type="pmid">17130148</pub-id></element-citation></ref>
<ref id="b40-ol-0-0-4604"><label>40</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ward</surname><given-names>AJ</given-names></name><name><surname>Cooper</surname><given-names>TA</given-names></name></person-group><article-title>The pathobiology of splicing</article-title><source>J Pathol</source><volume>220</volume><fpage>152</fpage><lpage>163</lpage><year>2010</year><pub-id pub-id-type="pmid">19918805</pub-id></element-citation></ref>
<ref id="b41-ol-0-0-4604"><label>41</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Karolchik</surname><given-names>D</given-names></name><name><surname>Barber</surname><given-names>GP</given-names></name><name><surname>Casper</surname><given-names>J</given-names></name><name><surname>Clawson</surname><given-names>H</given-names></name><name><surname>Cline</surname><given-names>MS</given-names></name><name><surname>Diekhans</surname><given-names>M</given-names></name><name><surname>Dreszer</surname><given-names>TR</given-names></name><name><surname>Fujita</surname><given-names>PA</given-names></name><name><surname>Guruvadoo</surname><given-names>L</given-names></name><name><surname>Haeussler</surname><given-names>M</given-names></name><etal/></person-group><article-title>The UCSC genome browser database: 2014 update</article-title><source>Nucleic Acids Res</source><volume>42</volume><fpage>D764</fpage><lpage>D770</lpage><year>2014</year><pub-id pub-id-type="doi">10.1093/nar/gkt1168</pub-id><pub-id pub-id-type="pmid">24270787</pub-id></element-citation></ref>
<ref id="b42-ol-0-0-4604"><label>42</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname><given-names>MA</given-names></name><name><surname>Gesell</surname><given-names>T</given-names></name><name><surname>Stadler</surname><given-names>PF</given-names></name><name><surname>Mattick</surname><given-names>JS</given-names></name></person-group><article-title>Widespread purifying selection on RNA structure in mammals</article-title><source>Nucleic Acids Res</source><volume>41</volume><fpage>8220</fpage><lpage>8236</lpage><year>2013</year><pub-id pub-id-type="doi">10.1093/nar/gkt596</pub-id><pub-id pub-id-type="pmid">23847102</pub-id></element-citation></ref>
<ref id="b43-ol-0-0-4604"><label>43</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schuster-B&#x00F6;ckler</surname><given-names>B</given-names></name><name><surname>Lehner</surname><given-names>B</given-names></name></person-group><article-title>Chromatin organization is a major influence on regional mutation rates in human cancer cells</article-title><source>Nature</source><volume>488</volume><fpage>504</fpage><lpage>507</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/nature11273</pub-id><pub-id pub-id-type="pmid">22820252</pub-id></element-citation></ref>
<ref id="b44-ol-0-0-4604"><label>44</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Forbes</surname><given-names>SA</given-names></name><name><surname>Bindal</surname><given-names>N</given-names></name><name><surname>Bamford</surname><given-names>S</given-names></name><name><surname>Cole</surname><given-names>C</given-names></name><name><surname>Kok</surname><given-names>CY</given-names></name><name><surname>Beare</surname><given-names>D</given-names></name><name><surname>Jia</surname><given-names>M</given-names></name><name><surname>Shepherd</surname><given-names>R</given-names></name><name><surname>Leung</surname><given-names>K</given-names></name><name><surname>Menzies</surname><given-names>A</given-names></name><etal/></person-group><article-title>COSMIC: Mining complete cancer genomes in the catalogue of somatic mutations in cancer</article-title><source>Nucleic Acids Res</source><volume>39</volume><fpage>D945</fpage><lpage>D950</lpage><year>2011</year><pub-id pub-id-type="doi">10.1093/nar/gkq929</pub-id><pub-id pub-id-type="pmid">20952405</pub-id></element-citation></ref>
<ref id="b45-ol-0-0-4604"><label>45</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ju</surname><given-names>YS</given-names></name><name><surname>Lee</surname><given-names>WC</given-names></name><name><surname>Shin</surname><given-names>JY</given-names></name><name><surname>Lee</surname><given-names>S</given-names></name><name><surname>Bleazard</surname><given-names>T</given-names></name><name><surname>Won</surname><given-names>JK</given-names></name><name><surname>Kim</surname><given-names>YT</given-names></name><name><surname>Kim</surname><given-names>JI</given-names></name><name><surname>Kang</surname><given-names>JH</given-names></name><name><surname>Seo</surname><given-names>JS</given-names></name></person-group><article-title>Fusion of KIF5B and RET transforming gene in lung adenocarcinoma revealed from whole-genome and transcriptome sequencing</article-title><source>Genome Res</source><volume>22</volume><fpage>436</fpage><lpage>445</lpage><year>2012</year><pub-id pub-id-type="doi">10.1101/gr.133645.111</pub-id><pub-id pub-id-type="pmid">22194472</pub-id></element-citation></ref>
<ref id="b46-ol-0-0-4604"><label>46</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dobin</surname><given-names>A</given-names></name><name><surname>Davis</surname><given-names>CA</given-names></name><name><surname>Schlesinger</surname><given-names>F</given-names></name><name><surname>Drenkow</surname><given-names>J</given-names></name><name><surname>Zaleski</surname><given-names>C</given-names></name><name><surname>Jha</surname><given-names>S</given-names></name><name><surname>Batut</surname><given-names>P</given-names></name><name><surname>Chaisson</surname><given-names>M</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name></person-group><article-title>STAR: Ultrafast universal RNA-seq aligner</article-title><source>Bioinformatics</source><volume>29</volume><fpage>15</fpage><lpage>21</lpage><year>2013</year><pub-id pub-id-type="doi">10.1093/bioinformatics/bts635</pub-id><pub-id pub-id-type="pmid">23104886</pub-id></element-citation></ref>
<ref id="b47-ol-0-0-4604"><label>47</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Quinlan</surname><given-names>AR</given-names></name><name><surname>Hall</surname><given-names>IM</given-names></name></person-group><article-title>BEDTools: A flexible suite of utilities for comparing genomic features</article-title><source>Bioinformatics</source><volume>26</volume><fpage>841</fpage><lpage>842</lpage><year>2010</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btq033</pub-id><pub-id pub-id-type="pmid">20110278</pub-id></element-citation></ref>
<ref id="b48-ol-0-0-4604"><label>48</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Roberts</surname><given-names>A</given-names></name><name><surname>Goff</surname><given-names>L</given-names></name><name><surname>Pertea</surname><given-names>G</given-names></name><name><surname>Kim</surname><given-names>D</given-names></name><name><surname>Kelley</surname><given-names>DR</given-names></name><name><surname>Pimentel</surname><given-names>H</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Pachter</surname><given-names>L</given-names></name></person-group><article-title>Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks</article-title><source>Nat Protoc</source><volume>7</volume><fpage>562</fpage><lpage>578</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/nprot.2012.016</pub-id><pub-id pub-id-type="pmid">22383036</pub-id></element-citation></ref>
<ref id="b49-ol-0-0-4604"><label>49</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Love</surname><given-names>MI</given-names></name><name><surname>Huber</surname><given-names>W</given-names></name><name><surname>Anders</surname><given-names>S</given-names></name></person-group><article-title>Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2</article-title><source>Genome Biol</source><volume>15</volume><fpage>550</fpage><year>2014</year><pub-id pub-id-type="doi">10.1186/s13059-014-0550-8</pub-id><pub-id pub-id-type="pmid">25516281</pub-id></element-citation></ref>
<ref id="b50-ol-0-0-4604"><label>50</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dees</surname><given-names>ND</given-names></name><name><surname>Zhang</surname><given-names>Q</given-names></name><name><surname>Kandoth</surname><given-names>C</given-names></name><name><surname>Wendl</surname><given-names>MC</given-names></name><name><surname>Schierding</surname><given-names>W</given-names></name><name><surname>Koboldt</surname><given-names>DC</given-names></name><name><surname>Mooney</surname><given-names>TB</given-names></name><name><surname>Callaway</surname><given-names>MB</given-names></name><name><surname>Dooling</surname><given-names>D</given-names></name><name><surname>Mardis</surname><given-names>ER</given-names></name><etal/></person-group><article-title>MuSiC: Identifying mutational significance in cancer genomes</article-title><source>Genome Res</source><volume>22</volume><fpage>1589</fpage><lpage>1598</lpage><year>2012</year><pub-id pub-id-type="doi">10.1101/gr.134635.111</pub-id><pub-id pub-id-type="pmid">22759861</pub-id></element-citation></ref>
<ref id="b51-ol-0-0-4604"><label>51</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hon</surname><given-names>GC</given-names></name><name><surname>Hawkins</surname><given-names>RD</given-names></name><name><surname>Ren</surname><given-names>B</given-names></name></person-group><article-title>Predictive chromatin signatures in the mammalian genome</article-title><source>Hum Mol Genet</source><volume>18</volume><fpage>R195</fpage><lpage>R201</lpage><year>2009</year><pub-id pub-id-type="doi">10.1093/hmg/ddp409</pub-id><pub-id pub-id-type="pmid">19808796</pub-id></element-citation></ref>
<ref id="b52-ol-0-0-4604"><label>52</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ritchie</surname><given-names>GRS</given-names></name><name><surname>Dunham</surname><given-names>I</given-names></name><name><surname>Zeggini</surname><given-names>E</given-names></name><name><surname>Flicek</surname><given-names>P</given-names></name></person-group><article-title>Functional annotation of noncoding sequence variants</article-title><source>Nat Methods</source><volume>11</volume><fpage>294</fpage><lpage>296</lpage><year>2014</year><pub-id pub-id-type="doi">10.1038/nmeth.2832</pub-id><pub-id pub-id-type="pmid">24487584</pub-id></element-citation></ref>
<ref id="b53-ol-0-0-4604"><label>53</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Washietl</surname><given-names>S</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name><name><surname>Garber</surname><given-names>M</given-names></name></person-group><article-title>Evolutionary dynamics and tissue specificity of human long noncoding</article-title><source>RNAs in six mammals</source><fpage>616</fpage><lpage>628</lpage><year>2014</year></element-citation></ref>
<ref id="b54-ol-0-0-4604"><label>54</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Loewen</surname><given-names>G</given-names></name><name><surname>Jayawickramarajah</surname><given-names>J</given-names></name><name><surname>Zhuo</surname><given-names>Y</given-names></name><name><surname>Shan</surname><given-names>B</given-names></name></person-group><article-title>Functions of lncRNA HOTAIR in lung cancer</article-title><source>J Hematol Oncol</source><volume>7</volume><fpage>90</fpage><year>2014</year><pub-id pub-id-type="doi">10.1186/s13045-014-0090-4</pub-id><pub-id pub-id-type="pmid">25491133</pub-id></element-citation></ref>
<ref id="b55-ol-0-0-4604"><label>55</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname><given-names>MH</given-names></name><name><surname>Hu</surname><given-names>ZY</given-names></name><name><surname>Xu</surname><given-names>C</given-names></name><name><surname>Xie</surname><given-names>LY</given-names></name><name><surname>Wang</surname><given-names>XY</given-names></name><name><surname>Chen</surname><given-names>SY</given-names></name><name><surname>Li</surname><given-names>ZG</given-names></name></person-group><article-title>MALAT1 promotes colorectal cancer cell proliferation/migration/invasion via PRKA kinase anchor protein 9</article-title><source>Biochim Biophys Acta</source><volume>1852</volume><fpage>166</fpage><lpage>174</lpage><year>2015</year><pub-id pub-id-type="doi">10.1016/j.bbadis.2014.11.013</pub-id><pub-id pub-id-type="pmid">25446987</pub-id></element-citation></ref>
<ref id="b56-ol-0-0-4604"><label>56</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Shen</surname><given-names>L</given-names></name><name><surname>Chen</surname><given-names>L</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name><name><surname>Jiang</surname><given-names>X</given-names></name><name><surname>Xia</surname><given-names>H</given-names></name><name><surname>Zhuang</surname><given-names>Z</given-names></name></person-group><article-title>Long noncoding RNA MALAT1 promotes brain metastasis by inducing epithelial-mesenchymal transition in lung cancer</article-title><source>J Neurooncol</source><volume>121</volume><fpage>101</fpage><lpage>108</lpage><year>2015</year><pub-id pub-id-type="doi">10.1007/s11060-014-1613-0</pub-id><pub-id pub-id-type="pmid">25217850</pub-id></element-citation></ref>
<ref id="b57-ol-0-0-4604"><label>57</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Okugawa</surname><given-names>Y</given-names></name><name><surname>Toiyama</surname><given-names>Y</given-names></name><name><surname>Hur</surname><given-names>K</given-names></name><name><surname>Toden</surname><given-names>S</given-names></name><name><surname>Saigusa</surname><given-names>S</given-names></name><name><surname>Tanaka</surname><given-names>K</given-names></name><name><surname>Inoue</surname><given-names>Y</given-names></name><name><surname>Mohri</surname><given-names>Y</given-names></name><name><surname>Kusunoki</surname><given-names>M</given-names></name><name><surname>Boland</surname><given-names>CR</given-names></name><name><surname>Goel</surname><given-names>A</given-names></name></person-group><article-title>Metastasis-associated long non-coding RNA drives gastric cancer development and promotes peritoneal metastasis</article-title><source>Carcinogenesis</source><volume>35</volume><fpage>2731</fpage><lpage>2739</lpage><year>2014</year><pub-id pub-id-type="doi">10.1093/carcin/bgu200</pub-id><pub-id pub-id-type="pmid">25280565</pub-id></element-citation></ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-ol-0-0-4604" position="float">
<label>Figure 1.</label>
<caption><p>Fitting and validation of the logistic regression model. (A) Densities of ClinVar and Human Gene Mutation Database pathogenic variants for all 25 non-coding features (red line, average density in the human genome). (B) Regression estimates for all features used in the logistic regression model. (C) Receiver operating characteristic curve for the model. (D) Scaled scores for GWAS, neutral SNPs (1 million random neutral SNPs), non-recurrent and recurrent mutations of lung cancer. CR, conserved region; TFBS, transcription factor binding site; cTFBS, conserved TFBS; UTR, untranslated region; HE, highly expressed region; SNP, single nucleotide polymorphism; Sensitive, known binding sites or motifs of transcription factors with high ratio of rare SNPs (allele frequency &#x003C;0.01); ncExon, non coding Exon; H3K4me1, H3K9ac, etc., histone modification data; ER, early replicated region; Dnase, Dnase I hypersensitive site; LE, low expressed region; ECS, evolutionarily conserved structure; LR, late replicated region; TPR, true positive rate; FPR, false positive rate; GWAS, genome-wide association study.</p></caption>
<graphic xlink:href="ol-12-01-0222-g00.jpg"/>
</fig>
<fig id="f2-ol-0-0-4604" position="float">
<label>Figure 2.</label>
<caption><p>Characterization of high-scoring regions in lung cancer. (A) Fraction of high-scoring regions in various non-coding features. (B) Average score in protein-coding gene and lncRNA introns near 5&#x2032; splice site (left panel) and 3&#x2032; splice site (right panel). (C) Fraction of high-scoring regions in various gene classes. (D) Density plot of fraction of high-scoring regions in lncRNAs. lncRNA, long non-coding RNA; PC, protein-coding; 5&#x2032;SS, 5&#x2032; splice site, 10 nucleotides from the 5&#x2032; end of introns of genes; 3&#x2032;SS, 3&#x2032; splice site, 50 nucleotides from the 3&#x2032; end of introns of genes; UTR, untranslated region; LUCAT1, lung cancer associated transcript 1; MALAT1, metastasis associated lung adenocarcinoma transcript 1; GAS5, growth arrest-specific 5; HOTAIR, HOX transcript antisense RNA.</p></caption>
<graphic xlink:href="ol-12-01-0222-g01.jpg"/>
</fig>
<fig id="f3-ol-0-0-4604" position="float">
<label>Figure 3.</label>
<caption><p>Characterization of functional lncRNA candidates in lung cancer. (A) Fraction of conserved regions in functional lncRNA candidates (candidate), control lncRNAs (control) and IR. (B) Average phastCons scores for functional lncRNA candidates (candidate), control lncRNAs (control) and IR. (C) Average densities of GWAS disease or trait-related SNPs for functional lncRNA candidates (candidate), control lncRNAs (control) and IR. (D) Average densities of somatic mutations for functional lncRNA candidates (candidate), control lncRNAs (control) and IR. (E) Relative expression (log scaled FPKM) for functional lncRNA candidates (candidate), control lncRNAs (control) and IR. lncRNA, long non-coding RNA; IR, intergenic regions; GWAS, genome-wide association studies; SNP, single nucleotide polymorphism; FPKM, fragments per kilobase.</p></caption>
<graphic xlink:href="ol-12-01-0222-g02.jpg"/>
</fig>
<fig id="f4-ol-0-0-4604" position="float">
<label>Figure 4.</label>
<caption><p>Expression changes for differentially expressed lncRNA candidates between lung cancer and normal samples. lncRNA, long non-coding RNA.</p></caption>
<graphic xlink:href="ol-12-01-0222-g03.tif"/>
</fig>
</floats-group>
</article>
