Comprehensive genomic analyses of a metastatic colon cancer to the lung by whole exome sequencing and gene expression analysis
- Li Tai Fang
- Sharon Lee
- Helen Choi
- Hong Kwan Kim
- Gregory Jew
- Hio Chung Kang
- Lin Chen
- David Jablons
- Il-Jin Kim
- Corresponding author:
Published online on: Friday, October 25, 2013
- Pages: 211-221
- DOI: 10.3892/ijo.2013.2150
Metastasis to distant organs is an acquired characteristic of cancer cells (1) and is a major cause of deaths from various human cancers. Metastatic progression from the primary tumor site involves multiple factors such as accumulation of genetic and epigenetic events and aquiring ability to colonize at the distant host organs (2). Although understanding the biology of primary tumors have been a main strategy to treat metastatic tumors, how these primary tumor cells evolve in the course of spreading to the mestatic organ is poorly understood.
Lung and colorectal cancer are the top and third most common cancers in the world, with over 1.6 million and 1.2 million new cases each year, respectively (3). Mutation status of EGFR, K-ras and EML4-ALK are important factors determining therapeutic regimens in non-small cell lung cancer (NSCLC) patients (4–5). Colon cancer is a genetically well-characterized human cancer. Since the multi-step carcinogenesis model was suggested, sequential genetic alterations such as mutations of APC, K-ras, Mismatch repair (MMRs), TP53 and loss of 18q have been reported in colorectal cancers (6,7). Genotype-phenotype correlations are also well-characterized in colorectal cancer. Notwithstanding these well-established genetic characteristics of colorectal cancer, the genetic mechanism causing metastasis to other organs is unclear. Liver and lung are the most common metastatic sites from colorectal cancer. Although metastasis to the lung from different primary tumors is a frequent event considering its dense connectivity with lymph nodes and blood vessels and its physical location, genetic alterations and underlying mechanisms would also affect its frequency and biological characteristics.
In this study, we describe a comprehensive genomic analysis of a metastatic colon cancer to the lung. Mutation analysis targeting whole exome is becoming prevalent and several papers describing somatic mutations at exome level have been published in primary lung and colorectal cancers (8–10). Nontheless, mutation or genetic information underlying metastatic colorectal cancer to the lung is very limited. To gain insights into genetic alterations associated with lung metastasis from colon cancer, we performed whole exome sequencing and genome-wide expression analysis in a patient with metastatic colon cancer to the lung. Among the identified mutation candidates, we chose 16 mutations for further validation using Ion Torrent PGM customized panel. Our pathway analyses identified that most of mutations were related with lung caner pathogenesis rather than a primary colon cancer, while gene expression signatures of the metastatic tumor were more correlated with the primary colon cancer. Undiscovered potential tumor supressor and oncogenes were also identified by combined exome sequencing and copy number analysis.
Materials and methods
Samples were collected under the IRB approval (11-06107) granted by the CHR of the University of California San Francisco (UCSF) along with written informed consent from the patient. The tumor and matched normal lung tissue studied came from a 74-year-old Asian female never-smoker who had colorectal cancer 4 years prior to this incident. She visited the hospital for chest discomfort. A chest X-ray revealed a mass-like lesion in the right lung. Chest computed tomography (CT) scan showed a 2 cm sized tumor in the right upper lobe. A positron emission tomography (PET) scan showed hypermetabolic fluorodeoxyglucose uptake with no evidence of other metastasis. She was suspected to have a lung metastasis, because of her history of colon cancer. She underwent a wedge resection of the right upper lobe via thoracotomy and had an uneventful recovery after the operation. The pathologic examination confirmed the lesion was metastatic adenocarcinoma originating from the primary colon cancer. She lived for 1 1/2 years after the lung metastasis operation.
Whole exome sequencing
Whole exome sequencing was done on SOLiD 5500 (Life Technologies). Samples were prepared according to protocols suggested by the manufacturer. Fragment library preparation began with 3 μg DNA and sheared with the Covaris S220 system. Resulting DNA fragments were end repaired, size selected with Agencourt Ampure XP reagent, dA-tailed, adaptor and barcode ligated, and then amplified with 6 PCR cycles. The library was then quantified with Agilent Technologies High Sensitivity DNA chip. A total of 500 ng of DNA was used to capture the exome regions using TargetSeq Exome Enrichment kit, for about 72 h. Exome DNA was amplified and then quantified with the SOLiD Library TaqMan Quantitation kit. The final exome library was diluted to 500 pM, templated to beads, amplified through emulsion PCR and enriched using the SOLiD EZ Bead Emulsifier, Amplifier and Enricher. The resulting libraries were then loaded onto the Flow Chip with a total of 6 multiplexed samples in 6 lanes.
Ion Torrent AmpliSeq sequencing for validation
Sixteen mutation candidates from the exome sequencing were validated by deep sequencing (more than 1,000 x coverage) using the Ion Torrent PGM AmpliSeq Custom Panel for the selected targets. In brief, the 200 bp Standard DNA option was used for the AmpliSeq primer design. Sample DNA was diluted to a concentration of 10 ng/μl. A total of 1 μl of the diluted DNA was used for amplicon library preparation according to the Ion Torrent protocol. Target sequences were amplified using the custom primers, followed by a partial digestion of the primers. Adapters and barcodes were ligated to the amplicons and purified using Agencourt AMPure XP reagent. The library was quantified by qPCR using the Ion Library Quantitation kit according to the protocol. The libraries were combined, templated onto beads and run through emulsion PCR using the OneTouch2 and ES machines. Samples were loaded into the 318 chips and run on the Ion Torrent PGM.
Microarray gene expression analysis
We also measured the mRNA expression level of samples using Affymetrix GeneTitan Gene ST 1.1 array. The detailed protocol has been previously described (11). In short, total RNA was extracted from the matched tumor and normal tissues, amplified into cRNA, and then made into cDNA. The cDNA was then fragmented and labeled. The labeled cDNA was added into the hybridization cocktail. The samples were then put onto hybridization trays and loaded into the Affymetrix GeneTitan MC for hybridization, washing and scanning. The Log2-scale expression data were extracted using the built-in Robust Multi-array Average (RMA) algorithm in the Affymetrix software (12).
Data analysis: somatic mutation detections
For the whole exome sequencing performed by SOLiD 5500, the color space raw data (XSQ) were converted to sequences and aligned to reference human genome hg19 using LifeScope Genomic Analysis Software v2.5.1, creating a BAM file for each barcode in each lane. The paired-end targeted resequencing workflow of the software was used for the task, which also includes the DiBayes algorithm for Single Nucleotide Variation (SNV) detection. In the tumor tissue, 40,314 SNPs and 2,287 InDels were called, of which 37,456 and 1,990 were in concordance with dbSNP132 (92.91 and 80.03%, respectively). The transition/transversion ratio of the SNPs in the tumor was 2.60. In the adjacent normal tissue, 41,738 SNPs and 2,570 InDels were called, of which 39,069 and 2,096 were in concordance with dbSNP132 (93.61 and 81.56%, respectively). The transition/transversion ratio of the SNPs in the matched normal was 2.61.
BAM files for each barcode and each lane were merged into a single BAM file for every sample (the tumor and the matched normal) using Picard 1.87 (http://picard.sourceforge.net). We used Strelka (13) and MuTect (14) for small somatic mutation detections. Both algorithms adopt a Bayesian probability model comparing the tumor with its matched normal data, taken into consideration that tumor samples are often heterogeneous and impure. Default configurations were kept when using both software, e.g., prior probability of somatic mutation is 10−6. The mutation calls were output in a VCF (variant calling format) file. Each variant was annotated by ANNOVAR, which annotates the gene symbol, chromosome position and the type of variant (e.g., synonymous, missense, nonsense, etc.) (15).
Ion Torrent PGM generated fastq files were aligned to hg19 using the MAP2 alignment algorithm of Torrent Mapping Alignment Program (TMAP 3.4.0). The number of reference and novel allele counts for the 16 sites were tallied using the Samtools 0.1.18 mpileup function (16).
Copy number analysis
We used ExomeCNV, an R package, to conduct a copy number variation (CNV) analysis to the exome sequencing data (17). ExomeCNV infers copy number alteration based on the normalized Log2 ratio of the depth of coverage between the tumor and its matched normal tissue (17). The highest novel-to-reference (i.e., mutant to wild-type) allele ratio in our mutation data was about 1:2. If we assume that a mutation occurs in all tumor cells and none in normal cells, then in the tumor tissue, we expect a 50-50 ratio of novel/reference counts for heterozygous mutations. The extra reference counts would come from normal cell contamination, contributing for a total of 1/3 of the coverage. Hence, we estimated a normal contamination of 1/3 in this tumor sample. The estimated normal contamination value 0.3 was applied to the copy number analysis.
Copy number of the JAZF1 gene, identified as being amplified in whole exome sequencing data was chosen for further individual validation by TaqMan Copy Number assay (Life Technologies). Genomic DNAs from matched tumor and normal were amplified in ABI 7900HT according to the manufacturer’s instruction and analyzed by copy number analysis algorithm as previously described (18). Copy number of RNase P gene was simultaneously analyzed to use as an endogenous normalization contol. Additional genomic DNAs from 8 matched primary lung tumors and normal tissues were included.
Identification of somatic mutations: a comparison of Strelka and MuTect
To retrieve somatic mutation calls from the whole exome sequencing data, Strelka (13) and MuTect (14) software was used. Strelka is a more stringent somatic mutation caller, identifying a total of 310 somatic variants including 135 non-synonymous variants. MuTect is more permissive, having identified a total of 7,341 somatic variants including 1,393 non-synonymous variants. Among the 135 identified non-synonymous candidate sites by Strelka, 100 of those non-synonymous variants were overlapped with MuTect’s calls (74.1% of concordance rate). Among the 310 total Strelka identified candidate sites, 221 of them were in common with MuTect’s calls (71.3% of the Strelka findings). On the other hand, well under 10% of MuTect calls were in common with Strelka. The mutation calls in common were much more likely to be true mutations, with an average novel allele (i.e., mutant) frequency of nearly 0.30, and novel allele counts of about 15. There was also no statistically significant difference between the variant and non-synonymous variants.
Pathway analysis using 71 high-confidence somatic mutations
Among 135 non-synonymous somatic mutation calls by Strelka, there were 70 mutation candidates with a somatic quality score (QSS) greater than 30, and a mutant allele frequency greater than 10% (Table I). We defined those 70 mutation calls as high-confidence mutations. In addition to those 70 high-confidence mutations, JAZF1 was a validated mutation with a QSS of 27 and mutant allele frequency of 25.2%, so a total of 71 genes were used for pathway analysis (Ingenuity Pathway Analysis, IPA) to get a general overview of their genetic pathways and functions (Table I). Perhaps unsurprisingly, due to the fact that those are somatic mutations from a metastatic colon cancer to the lung, the top two ‘Diseases and Disorders Networks’ were cancer and respiratory disease. The top 3 functional networks within Cancer were lung cancer (p=1.3×10−6, 35 genes), lung adenocarcinoma (p=5.4×10−6), and carcinoma in lung (p=7.7×10−6). Colon-related cancer network was not included in the top 10 networks (Table II).
The top 3 IPA Canonical Networks were ‘Phospholipase C Signaling’ (a total of 263 genes in this network, 6 of which are our somatic mutation submissions, or 6/263, p=2.2×10−4), ‘Molecular Mechanisms of Cancer’ (7/381, p=2.5×10−4), and ‘Colorectal Cancer Metastasis Signaling’ (6/262, p=2.6×10−4) (Table II). Knowing this tumor was a metastatic colon cancer, the Colorectal Cancer Metastasis Signaling network was of particular interest. The 6 genes (with somatic mutations in our sample) involved in Colorectal Cancer Metastasis Signaling were ADCY2, ADCY9, APC, GNB5, K-ras and LRP6. Aside from APC and K-ras which are known cancer markers, ADCY2, ADCY9, GNB5 and LRP6 also appear in Phospholipase C Signaling and Molecular Mechanisms of Cancer networks, indicating their potential role in colon cancer metastasis to the lung.
Mutation validation by targeted high coverage deep sequencing
We selected 16 candidates for deep sequencing validation by Ion Torrent PGM. Out of 16 candidates, 9 were either kinases or genes identified in the COSMIC (Catalog of Somatic Mutations in Cancer) Cancer Gene Census (http://cancer.sanger.ac.uk/cancergenome/projects/census/). For the remaining 7, all but 1 mutation calls were identified by both Strelka and MuTect.
Out of 16 mutation candidates, 15 were sucessfully amplified by targeted ampliseq panel (Ion Torrent PGM) and 13 out of 15 (87%) were validated as true mutations (Table III). For the validated mutations, the mutant allele counts ranged from several hundreds to over a thousand in the tumor, and less than 10 in the matched normal tissue (average coverage >2,000X). The 13 validated mutations include two mutations in APC and K-ras, well-known genes frequently mutated in colon cancers, three nonsense mutations in OR52K2, TMPRSS15, and SLITRK4, one frameshift deletion in HDAC6, and seven missense mutations in EIF4G3, MYLK, EPHB1, ROS1, JAZF1, TSC1 and C1orf173.
Copy number analysis of genes mutated in metastatic colon cancer to the lung
We performed a copy number analysis using whole exome sequencing data in order to have additional information on the genes mutated in metastastic colon cancer to the lung. The Copy Number Variation (CNV) calls for the 71 high-confidence and 13 validated mutation genes are shown in Tables I and III, respectively. Among the genes mutated and either amplified or deleted, copy number of JAZF1 harboring a missense mutation and being amplified in a metastatic colon tumor was further validated by TaqMan copy number assay. In accordance with the CNV information from whole exome sequencing data, the validation assay confirmed that JAZF1 was highly amplified in a metastastic colon cancer to the lung (Fig. 1).
A validated somatic mutation and copy number of JAZF1 from the exome sequencing in a metastatic colon cancer to the lung. (A) A missense mutation of JAZF1 (c.212G>A or R71Q) is shown by Integrative Genomics Viewer (IGV) (reverse strand). Six out of 26 total coverage were the mutant call in tumor and none out of 39 were the mutant call in normal. (B) Validation of copy number analysis of JAZF1 by TaqMan Copy Number assay. Copy number of genomic DNA from a metastatic colon tumor to the lung (223T, white bar) showed high amplification of the JAZF1 gene (copy number, 4.3). Additional DNA samples from matched primary lung tumor and normal tissues were simutaneously analyzed (black bars).
Genes differentially expressed in metastatic colon cancer to the lung and their network analysis
Of more than 20,000 genes in the Gene ST 1.1 array, 3,231 genes showed 2-fold differential expression between the metastatic tumor and the matched normal tissues. There were 786 genes with at least a 4-fold difference, and 247 genes with at least an 8-fold difference (Table IV). We entered these genes in 3 separate IPA analyses (analysis with the 3,231, 786 and 247 genes, respectively). In each case, the top ‘Disease and Disorder Network’ was Cancer category. Interestingly, unlike the networks of mutation signatures earlier described, top-ranked expression signatures of this metastatic tumor, in each analysis, was Gastrointestinal Disease followed by Respiratory Disease in the Disease and Disorder Network. Within the Cancer network, colorectal cancer, intestinal cancer and digestive tumor all rank above lung cancer. This analysis suggested that expression profiles of metastatic colon cancer to the lung represents more their primary tumor site (colon) retaining its initial characteristics of tumor development.
A total of 247 genes with over 8-fold differential expression between normal and metastatic colon cancer to the lung.
In this study, a comprehesive genomic analysis was performed on a patient with metastatic colon tumor to the lung. Whole exome sequencing and genome-wide expression analysis revealed characteristics of both somatic mutations and genes differentially expressed aquired during primary tumor progression to the metastasis. Exome analysis covers over 38 million base pairs. In exome screening, we found hundreds of mutation candidates, with varying confidence scores and allele frequencies. Thus, it is critical to know by experimentation what cutoff level gives a high-confidence threshold. To achieve this, we needed orthogonal method different from the original platform, to eliminate the possibility of false positives created by any platform specific artifacts. Thus, customized deep-sequencing panel was further designed and this allowed us to get validated mutations both in a quantitative and qualitiative way.
We did pathway analyses using both mutation and gene expression information from the same metastatic tumor. Interestingly, mutation data more closely reflected a characteristic of lung cancer (metastasized site) while gene expression data showed signatures related with colorectal cancer (primary cancer). There was only one colorectal cancer-related network by mutations. Six genes (ADCY2, ADCY9, APC, GNB5, K-ras and LRP6) were identified to be associated with colorectal cancer metastasis. ADCY2, ADCY9, GNB5 and LRP6 were also involved in Phospholipase C Signaling. It was reported that Phospholipase C ɛ (PLCE1) inhibited proliferation of colorectal cancer (19). Reduced expression of PLCD1 and PLCE1 was also reported in colorectal cancer biopsies and cell lines (20). Thus, it seems that genes involved in Phospholipase C Signaling are playing an important role in colorectal cancer progression and metastasis.
While the mutation signatures of the metastatic tumor were more closely related with lung cancer, differentially expressed genes with a significant change seemed to be more associated with primary colon cancer. The most significant networks by gene expression analysis were gastrointestinal and colorectal cancer groups (8.78×10−26<p<1.37×10−20). This suggests that metastatic tumors still preserved its original mRNA gene expression pattern while tumor cells acquired new mutations in a metastatic site. Out of 13 validated mutations, only APC seemed to be clearly related with colon cancer development. We could not investigate further due to unavailability of the primary tumor samples whether the identified K-ras mutation came from its primary colon cancer or metastatic tumor to the lung.
In addition to mutation and gene expression analysis, we also performed copy number analysis from the exome data. Copy number alteration is a common genetic variation in the human genome (21). Traditionally, it has been difficult to estimate the copy number of a gene based on the regional coverage in exome sequencing, because different exome regions have different capture efficiencies, leading to different base coverage in different chromosome regions. However, the ExomeCNV model (17) assumes that the same region of two different samples, due to their similarity (nearly identical) in sequences, have the same capture efficiency. However, the results would be different if the tumor tissue is mixed with normal cells. The specificity for CNV calls inevitably decreases if the tumor tissue is contaminated with a large portion of normal cells. In our case, we have relatively low normal tissue contamination rate, 0.3. Validation assay of the JAZF1 gene using TaqMan copy number assay showed a good correlation between the exome copy number analysis and Taqman data. This suggests that whole exome sequencing data can be used to generate mutation as well as copy number information.
We identified a missense mutation at codon 71 (R71Q) in JAZF1, its gain-of-copy being validated. JAZF1 encoding a nuclear protein with three zinc finger domains is a well-characterized genetic susceptibility gene for type II diabetes (22–23) and lupus erythematosus (24). Various types of fusions such as JAZF1-SUZ12, JAZF1-JJAZ1 and JAZF1-PHF1 have been reported in ESS (endometrial stromal sarcoma) (25). Although variations of JAZF1 have been involved in many diseases, mutations in the JAZF1 gene have barely been described in the literature. Our copy number analysis showed that this gene was amplified in the tumor. In lung cancer, EML4-ALK oncogenic fusion was first identified in 2007 (26), and its inhibitor targeting MET-ALK (crizotinib) was approved for treatment of patients with EML4-ALK. Based on the high frequency of JAZF1 fusions in another type of cancer and our amplification data in a metastatic tumor to the lung, JAZF1 may function as an oncogene leading to lung metastasis from colon cancer. Further functional studies are required to validate roles of JAZF1 in tumor progression and a lung metastasis from colon cancer.
TMPRSS15 (transmembrane protease, serine 15) had a nonsense mutation (R871X). TMPRSS15 is an enteropeptidase or an enterokinase activating pancreatic trypsin for releasing digestive enzyme (27). Interestingly, nonsense and frameshift mutations of TMPRSS15 (enteropeptidase/enterokinase) were found in families with congenital enteropeptidase deficiency (27). One of the reported nonsense mutations occurred at codon 857 (R857X), which is close to the nonsense mutation site (R871X) of our patient. This suggests that the region containing codons 857 and 871 may be susceptible for stop-causing mutation.
ACTR5 [ARP5 actin-related protein 5 homolog (yeast)] is another noteworthy gene (Table I). There are scarce data on the function of this gene in human cancer. In our data, ACTR5 has a missense mutation and an amplification. The mRNA expression was also higher by more than 2-fold in tumor (log2 value = 5.22) than normal (log2 value = 4.2). Although no information is available for this gene in either lung or colon cancers at this time, a novel missense mutation that is highly expressed and amplified suggests that this gene could also be important in either colon or lung cancer development and progression.
Like ALK and JAZF1, a rearrangement or a fusion is common in ROS1 ranging from 1 to 3% of NSCLC (28,29). It was recenlty reported that a secondary missense mutation at codon 2032 (G2032R) of ROS1 was involved in a resistance to crizotinib in a lung cancer patient with CD74-ROS1 (30). Our mutation (F2103S) is located near the identified G2032R and the L2026 gatekeeper residue of ROS1. Moreover, based on the crystal structure model by Awad et al (30), the F2103 seems to be a critical residue for crizotinib binding. We confirmed that our patient did not receive crizotinib treament. Thus, it will be meaningful to investigate the characteristics of the ROS1 mutations in lung or colon cancer patients with and without crizotinib treatment.
Taken together, we performed whole exome analysis in addition to the copy number and genome-wide expression analysis in a metastatic tumor. Pathway analyses of the genomic information identified different enrichment of mutation and gene expression levels in a metastatic colon cancer to the lung. Furthermore, ultra-high coverage NGS sequencing (>1,000X) confirmed the accuracy of exome sequencing, and have led to potentially novel cancer-associated genes which may lead to a metastasis from a primary organ.
This study was supported by the Barbara Isackson Lung Cancer Research Fund (D.J.), the Eileen D. Ludwig Endowed Fund for Thoracic Oncology Research (D.J.), Uniting Against Lung Cancer (UALC) (I.-J.K.), and Mesothelioma Applied Research Foundation (MARF) (I.-J.K.).
Choi H, Kratz J, Pham P, et al: Development of a rapid and practical mutation screening assay for human lung adenocarcinoma. Int J Oncol. 40:1900–1906. 2012.PubMed/NCBI
Rudin CM, Durinck S, Stawiski EW, et al: Comprehensive genomic analysis identifies SOX2 as a frequently amplified gene in small-cell lung cancer. Nat Genet. 44:1111–1116. 2012. View Article : Google Scholar : PubMed/NCBI
Saunders CT, Wong WS, Swamy S, et al: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 28:1811–1817. 2012. View Article : Google Scholar : PubMed/NCBI
Cibulskis K, Lawrence MS, Carter SL, et al: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 31:213–219. 2013. View Article : Google Scholar : PubMed/NCBI
Sathirapongsasuti JF, Lee H, Horst BA, et al: Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 27:2648–2654. 2011. View Article : Google Scholar : PubMed/NCBI
Danielsen SA, Cekaite L, Ågesen TH, et al: Phospholipase c isozymes are deregulated in colorectal cancer-insights gained from gene set enrichment analysis of the transcriptome. PLoS One. 6:e244192011. View Article : Google Scholar : PubMed/NCBI
Grarup N, Andersen G, Krarup NT, et al: Association testing of novel type 2 diabetes risk alleles in the jazf1, cdc123/camk1d, tspan8, thada, adamts9, and notch2 loci with insulin release, insulin sensitivity, and obesity in a population-based sample of 4,516 glucose-tolerant middle-aged Danes. Diabetes. 57:2534–2540. 2008. View Article : Google Scholar
Zeggini E, Scott LJ, Saxena R, et al: Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 40:638–645. 2008. View Article : Google Scholar : PubMed/NCBI
Gateva V, Sandling JK, Hom G, et al: A large-scale replication study identifies tnip1, prdm1, jazf1, uhrf1bp1 and il10 as risk loci for systemic lupus erythematosus. Nat Genet. 41:1228–1233. 2009. View Article : Google Scholar : PubMed/NCBI
Holzinger A, Maier EM, Bück C, et al: Mutations in the proenteropeptidase gene are the molecular cause of congenital enteropeptidase deficiency. Am J Hum Genet. 70:20–25. 2002. View Article : Google Scholar : PubMed/NCBI