Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer

  • Authors:
    • Hua Li
    • Xin Lv
  • View Affiliations

  • Published online on: May 18, 2016     https://doi.org/10.3892/ol.2016.4604
  • Pages: 222-230
Metrics: Total Views: 0 (Spandidos Publications: | PMC Statistics: )
Total PDF Downloads: 0 (Spandidos Publications: | PMC Statistics: )


Abstract

Multiple computational tools have been widely applied to the detection of coding driver mutations in cancer; however, the prioritization of pathogenic non-coding variants remains a difficult and demanding task. The present study was performed to distinguish non‑coding disease‑causing mutations from neutral ones, and to prioritize potential cancer-associated long non-coding RNAs (lncRNAs) with a logistic regression model in lung cancer. A logistic regression model was constructed, combining 19,153 disease‑associated ClinVar and Human Gene Mutation Database pathogenic variants as the response variable and non‑coding features as the predictor variable. Validation of the model was conducted with genome‑wide association study (GWAS) disease‑ or trait‑associated single nucleotide polymorphisms (SNPs) and recurrent somatic mutations. High scoring regions were characterized with respect to their distribution in various features and gene classes; potential cancer‑associated lncRNA candidates were prioritized, combining the fraction of high‑scoring regions and average score predicted by the logistic regression model. H3K79me2 was the most negative factor that contributed to the model, while conserved regions were most positively informative to the model. The area under the receiver operating characteristic curve of the model was 0.89. The model assigned a significantly higher score to GWAS SNPs and recurrent somatic mutations compared with neutral SNPs (mean, 5.9012 vs. 5.5238; P<0.001, Mann‑Whitney U test) and non‑recurrent mutations (mean, 5.4677 vs. 5.2277, P<0.001, Mann‑Whitney U test), respectively. It was observed that regions, including splicing sites and untranslated regions, and gene classes, including cancer genes and cancer-associated lncRNAs, had an increased enrichment of high‑scoring regions. In total, 2,679 cancer‑associated lncRNAs were determined and characterized. A total of 104 of these lncRNAs were differentially expressed between lung cancer and normal specimens. The logistic regression model is a useful and efficient scoring system to prioritize non‑coding pathogenic variants and lncRNAs, and may provide the basis for detecting non‑coding driver lncRNAs in lung cancer.
View Figures
View References

Related Articles

Journal Cover

July-2016
Volume 12 Issue 1

Print ISSN: 1792-1074
Online ISSN:1792-1082

Sign up for eToc alerts

Recommend to Library

Copy and paste a formatted citation
x
Spandidos Publications style
Li H and Li H: Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer. Oncol Lett 12: 222-230, 2016
APA
Li, H., & Li, H. (2016). Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer. Oncology Letters, 12, 222-230. https://doi.org/10.3892/ol.2016.4604
MLA
Li, H., Lv, X."Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer". Oncology Letters 12.1 (2016): 222-230.
Chicago
Li, H., Lv, X."Functional annotation of noncoding variants and prioritization of cancer-associated lncRNAs in lung cancer". Oncology Letters 12, no. 1 (2016): 222-230. https://doi.org/10.3892/ol.2016.4604