Smoking alters the evolutionary trajectory of non‑small cell lung cancer
- Authors:
- Published online on: August 29, 2019 https://doi.org/10.3892/etm.2019.7958
- Pages: 3315-3324
-
Copyright: © Yu et al. This is an open access article distributed under the terms of Creative Commons Attribution License.
Abstract
Introduction
Lung cancer patients make up ~14% of newly diagnosed cancer cases and is the second most widespread cancer worldwide (1). Of those, ~85% are non-small cell lung cancer (NSCLC) (2). Lung cancer not only has high incidence, but also high death rate. It is a huge healthcare and economic burden for both developing and developed countries.
There are many possible factors that may contribute to the genesis of lung cancer (2). Genetics can explain a large proportion of lung cancer occurrence as many single nucleotide polymorphisms have been discovered to be associated with lung cancer susceptibility by genome-wide association studies (3). Environment factors, such as air pollution (4), particulate matter 2.5 (5) and smoking, can facilitate the epigenetic dysfunctions which will interact with genetic changes and trigger tumorgenesis (2,6–9). Cigarette smoke includes over 5,000 compounds (10), such as nicotine, free radicals, benzopyrene, catechols, polonium-210 and heavy metals (11). Many of these compounds are strong carcinogenic chemicals (12), which can interfere with DNA mismatch repair and cause somatic mutations. Cigarette smoking accounts for 87% of lung cancer deaths (13) and is the leading risk factor.
Unfortunately, the genetic mechanisms of smoking leading to lung carcinogenesis are largely unknown and many observations are contradictory (10). For example, benzoapyrene, a carcinogenic chemical from smoke, can induce lung tumors in mice but not in rats (14). On the molecular level, several well-established signaling pathways, such as cyclooxygenase and its derived prostanoids, peroxisome proliferator-activated receptor γ and arachidonate 15-lipoxygenase, epidermal growth factor receptor (EGFR) and the P13K/AKT/mTOR and vascular endothelial growth factor-dependent angiogenetic pathway, have been reported to have important roles (10). As a complex systems disease (2), lung cancer dysfunctions are dynamic and the evolution of smoking-induced lung cancer, i.e. the series of genetic events, can elucidate a more realistic picture of tumorigenesis. With the rapid development of next-generation sequencing, the somatic mutations in cancer patients can be more easily identified. Based on somatic mutation data, the evolutionary trajectories of cancer patients can be reconstructed. Caravagna et al (15) developed an algorithm called Pipeline for Cancer Inference (PiCnIc) to analyze the colon adenocarcinoma and rectum adenocarcinoma (COAD/READ) somatic mutation data from The Cancer Genome Atlas project. The underlying somatic evolution based on Suppes' probabilistic causation was reconstructed (16) and it was determined that mutations in APC regulator of WNT signaling pathway, KRAS proto-oncogene, and tumor protein p53 were primary events for micro-satellite stable COAD/READ tumors, which was consistent with previous literature. Brown et al (17), performed phylogenetic analysis on whole-exome sequencing and copy number profiling data of primary and metastatic breast cancer samples and inferred the phylogeny of genomic alterations during breast cancer progression. The study utilized the Dollo parsimony method and the branch and bound exhaustive search algorithm described in Felsenstein (18), to reconstruct the phylogenetic tree.
To investigate the genomic alterations triggered by smoking, the present study analyzed the somatic mutations in 100 NSCLC patients. The different genomic alterations amongst non-smokers, ex-smokers and smokers were identified and the most frequent genetic alterations of each smoking subgroup were analyzed to construct oncogenetic trees, which revealed the evolutionary trajectories of smoking NSCLC. The present results provided novel insights into NSCLC development due to smoking and also identified potential intervention targets for treating NSCLC patients.
Materials and methods
NSCLC somatic mutation dataset
TRAcking Cancer Evolution through therapy (TRACERx) Consortium is a multi-million pound project funded by Cancer Research UK to better understand the genetic risks of lung cancer through exploring the human genome. The present study obtained the somatic mutation data and smoking status data of 100 NSCLC patients from Jamal-Hanjani et al (19). The clinical information of these 100 patients are provided in Table SI. The dataset consists of 12 people who never smoked in their life, 48 people who used to smoke but have quit smoking for >20 years and 40 current smokers or recent ex-smokers. The somatic mutations were annotated to genes. If there were non-synonymous exonic alterations within a gene, this was considered as a mutated gene and it was allocated ‘1’; otherwise genes were classed as ‘0’. There were 11,345 genes that were mutated in at least 1 of the 100 NSCLC patients. An 11345×100 matrix was produced where rows denoted genes, the columns were patients and the binary value indicated whether the particular gene was mutated in this patient.
Unlike the TRACERx study by Jamal-Hanjani et al (19), which analyzed the intratumor heterogeneity by constructing phylogenetic trees for each patient, the present study was interested in characterizing the general mutation pattern within patient subtypes.
Identifying the mutated genes amongst different smoking status groups
To identify the various mutated genes amongst different smoking status groups, the Fisher's Exact Test (20) was applied for the confusion table of mutation status and smoking status. P<0.05 was considered to indicate statistical significance.
Construction of the evolutionary trajectories for different smoking status groups
How the most frequently mutated genes evolved in different smoking status groups was analyzed using Oncotree (21,22), a widely used method for oncogenetic tree deduction (23).
In an oncogenetic tree model, the evolutionary trajectories of tumor genesis are simplified and the causality between genetic alteration events is assumed to occur sequentially. In addition, the causation of a genetic alteration event by another is independent of other causations.
The Oncotree method involves several steps. First, a set of the most relevant genetic events is selected. For the present study, the top 10 most frequent genetic alterations for each smoking status group were considered as relevant for the progression of the tumor group and therefore were selected to be modeled. Then, each pair of such genetic events was assigned a weight corresponding to the probabilities of joint or individual occurrence. Finally, based on the assigned weights, the optimal oncogenetic tree was inferred as maximum-weight branching (21,22).
The method was applied for the present study using R package Oncotree (http://cran.r-project.org/web/packages/Oncotree/).
Annotation of the biological function of the mutated genes
WebGestalt was used to annotate the biological function of the mutated genes (24). WebGestalt is a widely used online enrichment tool to model organisms including human, mouse, rat, yeast, fruit fly and Caenorhabditis elegans. It has many annotation databases integrated, including Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, DrugBank and Online Mendelian Inheritance in Man. The P-value of overrepresentation enrichment analysis was multiple test-adjusted as the false discovery rate (FDR). In the present study, the enriched categories with FDR<0.2 were considered as significant.
Results and Discussion
A total of 68 genes demonstrate different mutation patterns amongst smoking status groups
Fisher's exact test was used to identify the different mutated genes amongst the various smoking status groups. A total of 68 gene mutations were considered as significant to smoking status (P<0.05; Table I). The OncoPrinter plots of these 68 genes in the three different smoking status groups, non-smoker, ex-smoker and smoker, are displayed in Fig. 1. The genes were ranked based on the mutation frequency in all lung cancer patients. Zinc finger homeobox 4 (ZFHX4), usherin (USH2A), CUB and Sushi multiple domains 1 (CSMD1), CUB and Sushi multiple domains 2 (CSMD2), spectrin α erythrocytic 1 (SPTA1), pappalysin 2 (PAPPA2), dynein axonemal heavy chain 9 (DNAH9), contactin-associated protein like 5 (CNTNAP5), additional sex combs like 3 (ASXL3) were highly mutated in ex-smokers and smokers, but not in non-smokers. The mutation rate was associated the smoking status with the current smokers demonstrating the highest rate of mutated genes. There were several non-smoker specific mutations, such as lysine demethylase 8 (KDM8), zinc finger protein 677 (ZNF677), TEA domain transcription factor 1 (TEAD1) and phosphatidylinositol glycan anchor biosynthesis class M (PIGM). These non-smoker specific mutations suggested that tumorigenesis of lung cancer in non-smoker patients was different from the tumorigenesis of lung cancer in smoking patients.
Table I.A total of 68 genes that demonstrated different mutation patterns amongst non-smokers, ex-smokers and smokers. |
Biological functions of the 68 gene mutations associated with smoking status
The 68 gene mutations associated with smoking status were annotated using Gene Ontology (GO) and the biological process (BP), cellular component (CC) and molecular function (MF) categories (Fig. 2). Numerous genes were annotated to be membrane proteins with biological regulation, metabolic process, and response to stimulus functions (Fig. 2). These results were expected since smoke is a xenobiotic stimulus to the human body and the chemicals can affect normal metabolic processes, and alter the biological regulations. Rigorous statistical test for the enrichment significance using WebGestalt was performed for deeper investigation into gene function (24) with significantly enriched BP (Table II), CC (Table III) and MF (Table IV) categories. It was demonstrated that the organ development, morphogenesis of an epithelial fold, muscle tissue morphogenesis and the muscle organ morphogenesis categories were enriched (Table II). These genes may serve an important role in tumor initiation and help transform the normal lung tissue to tumor tissue. Proteins associated with the plasma membrane were enriched (Table III), which was consistent with the preliminary biological function analysis (Fig. 2), and indicated that the mutated genes were involved in stimulus response. In addition, enrichment of proteins associated with muscle/fiber functions suggested that the mutated genes may change the lung muscle structure. Significant enrichment of multiple binding functions proved that the mutated genes were key players in signaling transduction and regulation (Table IV), which may amplify the dysfunctions and accelerate tumorigenesis.
Table II.Significantly enriched GO biological process categories of the 68 mutated genes associated with smoking status. |
Table III.Significantly enriched GO cellular component categories of the 68 mutated genes associated with smoking status. |
Table IV.Significantly enriched GO molecular function categories of the 68 mutated genes associated with smoking status. |
Evolutionary trajectories of non-smoker, ex-smoker and smoker lung cancer patients
Cancer is a complex multigene and multiprocess disease. The tumorigenesis of colorectal cancer is well studied (25,26) and can be used as a perfect example to explain the roles of mutations in causing pathway dysfunctions. The process includes several steps (25): i) Mutation of mismatch-repair (MMR) gene; ii) microsatellite instability (MSI) pathway dysfunction caused by MMR mutation; iii) normal epithelium becomes small adenoma; iv) chromosomal instability and mutations in KRAS and BRAF; v) serrated adenoma pathway dysfunction triggered by BRAF mutation; vi) small adenoma becomes large adenoma; and vii) mutations of PIK3CA, PTEN, tumor protein p53 (TP53), BAX, SMAD4 and transforming growth factor β receptor 2 accelerate the progression from large adenoma to cancer.
Similarly, lung cancer must also have several mutational events, which occur sequentially to initiate and accelerate tumorigenesis. Smoking is a major risk factor that can cause genetic and epigenetic changes that alter the tumorigenesis procedures. Research into this process will help explain the mechanism differences between smoker and non-smoker lung cancer patients.
The Oncotree method was used to produce oncogenetic trees of the top 10 most frequent mutated genes in non-smoker, ex-smoker and smoker lung cancer patients (Fig. 3). For non-smokers, the early events were EGFR and titin (TTN) mutation. The late EGFR events were mutations of PIGM and zinc finger protein 677, while TTN was followed by mutations of TEAD1, olfactory receptor family 6 subfamily P member 1, catenin β 1, huntingtin interacting protein 1, protocadherin γ subfamily A 8 and SUMO1/sentrin specific peptidase 7. For ex-smokers, TTN was also an early event but more early events were detected compared with non-smokers, including mutations of ryanodine receptor 2, ZFHX4 and CSMD1. For smokers, the results revealed the highest number of early events, including mutations of TTN, ryanodine receptor 2, USH2A, SPTA1 and CSMD1. Results demonstrated that smoking increased spontaneous mutations and formed more complex oncogenetic trees. For non-smokers, EGFR was the primary mutation whilst in ex-smokers and smokers, the importance of TTN was increased. Almost all smokers had the TTN mutation.
Oncogenetic differences between non-smoker, ex-smoker and smoker lung cancer patients
Based on the oncogenetic trees of non-smoker, ex-smoker and smoker lung cancer patients (Fig. 3), the key driver gene of non-smoker lung cancer patients was EGFR, whilst the key driver gene of smoker lung cancer patients was TTN.
EGFR is a well-known oncogene that affects the PI3K and RAS pathway and accelerates cell growth and survival (27). EGFR is widely expressed in >60% of NSCLC patients and is a clinically relevant target of tyrosine kinase inhibitors (TKIs). EGFR mutations are more frequent in Asians, females, non-smokers and lung adenocarcinomas (28,29). The present findings determined that EGFR was the key driver gene of non-smoker lung cancer patients which was in agreement with the literature (28,29).
TTN encodes a protein of striated muscle and is the key component for striated muscle assembly and function. TTN mutation is very frequent in the majority of cancer types with the second highest mutation rate behind TP53 in The Cancer Genome Atlas dataset (30). In the present study, 65 patients had the TTN mutation and 35 patients did not. For the 65 patients with TTN mutation, there were 2 adenosquamous carcinoma, 2 carcinosarcoma, 31 invasive adenocarcinoma, 1 large cell carcinoma and 29 squamous cell carcinoma patients. For the 35 patients without TTN mutations, there were 1 adenosquamous carcinoma, 30 invasive adenocarcinoma, 1 large cell neuroendocrine and 3 squamous cell carcinoma patients. Although its mechanisms remain largely unknown, TTN has great potential for investigation due to its roles in tumorigenesis and progression (30). The present study determined that TTN may function through regulating DNAH9, USH2A, SPTA1 or CSMD2 based on the oncogenetic trees (Fig. 3). Although the oncogenetic tree only demonstrated the process of genetic alteration occurrence, it provided hints of functional regulations; however, this needs to be further confirmed. To explore the possible regulation mechanisms of TTN, the protein functional association network STRING (31,32) was used with medium confidence (>0.4). It was determined that TTN can interact with SPTA1 through calmodulin 2 (CALM2) and troponin C1 (TNNC1; Fig. 4). The STRING confidence scores of each interaction (Table SII) were 0.722 for TTN and CALM2, 0.962 for SPTA1 and CALM2, 0.965 for TTN and TNNC1 and 0.537 for SPTA1 and TNNC1. These results provided insight into how TTN may function in lung cancer of smoking patients, or even other types of cancer.
There were limitations to the oncogenetic tree model. Firstly, the model was based on association rather than causality and the results could not be treated as actual biological regulations, therefore these should be further investigated with experimental methods. Secondly, the oncogenetic tree model cannot handle a large number of genes. The input genes should be carefully picked based on mutation frequency or biological literature with only the highly possible genes analyzed. It is not a general method that can be applied on a genome wide scale. Finally, the sample size should be large enough to capture the association so results generated on small datasets need to be interpreted with caution.
In conclusion, lung cancer is a complex multigene, multiprocess disease with complex genetic and environmental risk factors. Smoking is the biggest risk factor that can alter the genetics and epigenetics of lung tissue causing cancer. Smokers have a much greater chance of developing lung cancer. The present study compared the mutation patterns of non-smoker, ex-smoker and smoker lung cancer patients and identified 68 genes that were significantly differentially mutated amongst smoking status groups. Furthermore, oncogenetic trees were constructed of the top 10 most frequently mutated genes in each group and analyzed. It was identified that in non-smoker lung cancer patients, the key driver gene was EGFR, whilst in smoker lung cancer patients the key driver gene was TTN. The EGFR mutation finding in non-smokers is in line with previous literature. A potential mechanism for the high frequency mutated gene TTN in tumorigenesis was suggested. The present study provided novel insights into the effect of smoking on altering the evolutionary trajectory of lung cancer and its progression.
Supplementary Material
Supporting Data
Supporting Data
Acknowledgements
Not applicable.
Funding
No funding was received.
Availability of data and materials
The datasets generated and/or analyzed during the present study are available from the corresponding author on reasonable request.
Authors' contributions
FMZ designed the experiment and XJY performed the experiment. GC, JY, GCY and PFZ analyzed the data and performed data analysis. ZKJ, KF, YL and BB contributed to the study design. KF and YL wrote the article. ZKJ and BB revised the article. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Patient consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
References
Siegel R, Naishadham D and Jemal A: Cancer statistics, 2012. CA Cancer J Clin. 62:10–29. 2012. View Article : Google Scholar : PubMed/NCBI | |
Huang T, Jiang M, Kong X and Cai YD: Dysfunctions associated with methylation, microRNA expression and gene expression in lung cancer. PLoS One. 7:e434412012. View Article : Google Scholar : PubMed/NCBI | |
Bossé Y and Amos CI: A decade of GWAS results in lung cancer. Cancer Epidemiol Biomarkers Prev. 27:363–379. 2018. View Article : Google Scholar : PubMed/NCBI | |
Jiang CL, He SW, Zhang YD, Duan HX, Huang T, Huang YC, Li GF, Wang P, Ma LJ, Zhou GB and Cao Y: Air pollution and DNA methylation alterations in lung cancer: A systematic and comparative study. Oncotarget. 8:1369–1391. 2017.PubMed/NCBI | |
Shu Y, Zhu L, Yuan F, Kong X, Huang T and Cai YD: Analysis of the relationship between PM2.5 and lung cancer based on protein-protein interactions. Comb Chem High Throughput Screen. 19:100–108. 2016. View Article : Google Scholar : PubMed/NCBI | |
Liu C, Zhang YH, Huang T and Cai Y: Identification of transcription factors that may reprogram lung adenocarcinoma. Artif Intell Med. 83:52–57. 2017. View Article : Google Scholar : PubMed/NCBI | |
Li BQ, You J, Chen L, Zhang J, Zhang N, Li HP, Huang T, Kong XY and Cai YD: Identification of lung-cancer-related genes with the shortest path approach in a protein-protein interaction network. Biomed Res Int. 2013:2673752013.PubMed/NCBI | |
Li BQ, You J, Huang T and Cai YD: Classification of non-small cell lung cancer based on copy number alterations. PLoS One. 9:e883002014. View Article : Google Scholar : PubMed/NCBI | |
Huang T, Yang J and Cai YD: Novel candidate key drivers in the integrative network of genes, microRNAs, methylations and copy number variations in squamous cell lung carcinoma. Biomed Res Int. 2015:3581252015.PubMed/NCBI | |
Tonini G, D'Onofrio L, Dell'Aquila E and Pezzuto A: New molecular insights in tobacco-induced lung cancer. Future Oncol. 9:649–655. 2013. View Article : Google Scholar : PubMed/NCBI | |
Hecht SS: More than 500 trillion molecules of strong carcinogens per cigarette: Use in product labelling? Tob Control. 20:3872011. View Article : Google Scholar : PubMed/NCBI | |
Chen L, Chu C, Lu J, Kong X, Huang T and Cai YD: A computational method for the identification of new candidate carcinogenic and non-carcinogenic chemicals. Mol Biosyst. 11:2541–2550. 2015. View Article : Google Scholar : PubMed/NCBI | |
Zon RT, Goss E, Vogel VG, Chlebowski RT, Jatoi I, Robson ME, Wollins DS, Garber JE, Brown P and Kramer BS; American Society of Clinical Oncology, : American society of clinical oncology policy statement: The role of the oncologist in cancer prevention and risk assessment. J Clin Oncol. 27:986–993. 2009. View Article : Google Scholar : PubMed/NCBI | |
Nesnow S, Ross JA, Stoner GD and Mass MJ: Mechanistic linkage between DNA adducts, mutations in oncogenes and tumorigenesis of carcinogenic environmental polycyclic aromatic hydrocarbons in strain A/J mice. Toxicology. 105:403–413. 1995. View Article : Google Scholar : PubMed/NCBI | |
Caravagna G, Graudenzi A, Ramazzotti D, Sanz-Pamplona R, De Sano L, Mauri G, Moreno V, Antoniotti M and Mishra B: Algorithmic methods to infer the evolutionary trajectories in cancer progression. Proc Natl Acad Sci USA. 113:E4025–E4034. 2016. View Article : Google Scholar : PubMed/NCBI | |
Suppes P: A probabilistic theory of causalityNorth-Holland Pub. Co.; Amsterdam: 1970, PubMed/NCBI | |
Brown D, Smeets D, Székely B, Larsimont D, Szász AM, Adnet PY, Rothé F, Rouas G, Nagy ZI, Faragó Z, et al: Phylogenetic analysis of metastatic progression in breast cancer using somatic mutations and copy number aberrations. Nat Commun. 8:149442017. View Article : Google Scholar : PubMed/NCBI | |
Rohlf FJ: J. Felsenstein J, Inferring PhylogeniesSinauer Associates Inc.; Sunderland, MA: 2004 | |
Jamal-Hanjani M, Wilson GA, McGranahan N, Birkbak NJ, Watkins TBK, Veeriah S, Shafi S, Johnson DH, Mitter R, Rosenthal R, et al: Tracking the evolution of non-small-cell lung cancer. N Engl J Med. 376:2109–2121. 2017. View Article : Google Scholar : PubMed/NCBI | |
Fisher RA: The logic of inductive inference. J Royal Stat Soc. 98:39–82. 1935. View Article : Google Scholar | |
Szabo A and Boucher K: Estimating an oncogenetic tree when false negatives and positives are present. Math Biosci. 176:219–236. 2002. View Article : Google Scholar : PubMed/NCBI | |
Desper R, Jiang F, Kallioniemi OP, Moch H, Papadimitriou CH and Schäffer AA: Inferring tree models for oncogenesis from comparative genome hybridization data. J Comput Biol. 6:37–51. 1999. View Article : Google Scholar : PubMed/NCBI | |
Li XC, Liu C, Huang T and Zhong Y: The occurrence of genetic alterations during the progression of breast carcinoma. Biomed Res Int. 2016:52378272016.PubMed/NCBI | |
Zhang B, Kirov S and Snoddy J: WebGestalt: An integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 33((Web Server Issue)): W741–W748. 2005. View Article : Google Scholar : PubMed/NCBI | |
Markowitz SD and Bertagnolli MM: Molecular origins of cancer: Molecular basis of colorectal cancer. N Engl J Med. 361:2449–2460. 2009. View Article : Google Scholar : PubMed/NCBI | |
Calvert PM and Frucht H: The genetics of colorectal cancer. Ann Intern Med. 137:603–612. 2002. View Article : Google Scholar : PubMed/NCBI | |
Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr and Kinzler KW: Cancer genome landscapes. Science. 339:1546–1558. 2013. View Article : Google Scholar : PubMed/NCBI | |
Proceedings from the 10th annual meeting of molecularly targeted therapy in non-small cell lung cancer. J Thorac Oncol. 5 (12 Suppl 6):S433–S496. 2010. View Article : Google Scholar | |
Tokumo M, Toyooka S, Kiura K, Shigematsu H, Tomii K, Aoe M, Ichimura K, Tsuda T, Yano M, Tsukuda K, et al: The relationship between epidermal growth factor receptor mutations and clinicopathologic features in non-small cell lung cancers. Clin Cancer Res. 11:1167–1173. 2005.PubMed/NCBI | |
Kim N, Hong Y, Kwon D and Yoon S: Somatic mutaome profile in human cancer tissues. Genomics Inform. 11:239–244. 2013. View Article : Google Scholar : PubMed/NCBI | |
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al: STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43((Database Issue)): D447–D452. 2015. View Article : Google Scholar : PubMed/NCBI | |
Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, et al: STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47:D607–D613. 2019. View Article : Google Scholar : PubMed/NCBI |