Introduction

Oncology Letters

1792-1074 1792-1082

D.A. Spandidos

10.3892/ol.2020.11976

OL-0-0-11976

Articles

Integrated analysis of whole genome and transcriptome sequencing in a young patient with gastric cancer provides insights for precision therapy

Kongwang

1 Yu

Weiqiang

2 Ajayi

Olugbenga Emmanuel

2 Li

Longlong

1 Huang

Zhiguo

1 Rong

Qiqi

2 Wang

Shuaili

2 Wu

Qing-Fa

1Division of Gastrointestinal Surgery, Department of General Surgery, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui 230022, P.R. China 2Hefei National Laboratory for Physical Sciences at Microscale, CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, P.R. China

Correspondence to: Professor Qing-Fa Wu, Hefei National Laboratory for Physical Sciences at Microscale, CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, 443 HuangShan Road, Hefei, Anhui 230026, P.R. China, E-mail: wuqf@ustc.edu.cn

10 2020

12 08 2020

20 4

115

04062019 17072020

2020

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

Gastric cancer is a leading cause of cancer-associated deaths worldwide and is considered to be an age-related disease. In younger patients, gastric cancer is biologically more aggressive, and prognosis is worse compared with that in elderly patients. In the present case report, the whole genome and transcriptome was sequenced in a 26-year-old patient with gastric cancer who presented with gastric cancer-related symptoms and was admitted to the First Affiliated Anhui Medical Hospital (Hefei, China) in December 2016. In total, 9 germline and 4 somatic mutations were identified in the patient, and there were more deleterious sites in the germline mutated genes. Genes with somatic mutations, such as MUC2, MUC4, SLC8A2, and with structural variations, including CCND3, FGFR2 and FGFR3, were found to be differentially expressed. Cancer-associated pathways, such as the ‘calcium signaling pathway’, ‘cGMP-PKG signaling pathway’ and ‘transcriptional mis-regulation’ were also enriched at both the genomic and transcriptomic levels. The genes found to have germline (SFRP4), somatic (MUC2, MUC4, SLC8A2) mutations, or structural variations (CCND3, FGFR2 and FGFR3) were differentially expressed in the patient and could be promising precision therapy targets.

gastric cancer young age germline mutations somatic mutations TCGA STAD

Introduction

Gastric cancer (GC) is a leading cause of cancer-related deaths worldwide and is known to be an age-related disease (1). The mean age of patients with GC at diagnosis is ~60 years and <3% of GC cases are reported in patients <30 years of age (2,3). Several reports have found that younger patients (~30 years) are often diagnosed with advanced stages of GC and have worse prognosis compared with that in elderly patients (~60 years) (4,5). In younger patients, the cancer was found to spread rapidly and was biologically more aggressive (6). Integrative genomic approaches, which associate genomic and transcriptomic analysis have been found to increase the identification of numerous events and processes associated with tumor aggressiveness and may assist in selecting candidate genes from normal and pathological samples for targeted therapy (7). A previous study aimed to investigate the biological and genomic mechanisms in the progression of GC has led to the development of target-oriented therapy for advanced GC to be used in a clinical setting, with base substitution mutations in genes such as tumor protein p53 (TP53), AT-rich interaction domain 1A (ARID1A), KRAS proto-oncogene, GTPase (KRAS), phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha (PIK3CA), ring finger protein 43 (RNF43), APC regulator of WNT signaling pathway (APC), RAS p21 protein activator 1 (RASA1) and erb-b2 receptor tyrosine kinase 2 (ERBB2) identified as important therapeutic targets in anti-cancer treatments (8). Developing personalized therapy for optimal individual cancer patient outcome has become more feasible due to the rapid advancement in next-generation sequencing techniques and technologies that enable fast and comprehensive characterizations of tumors at the molecular level (9–12). As every patient harbors a unique combination of variants that influence the risk, onset, and progression of the disease, an effective personalized therapy is dependent on a well-profiled genome from the individual patient with cancer and understanding the oncogenic mechanisms that regulate the progression of the tumor (13). With a view to identifying potential clinically actionable therapeutic targets that may inform individualized treatment strategies, the whole genome was sequenced in a 26-year-old patient with advanced stage GC, by integrating the whole genome and whole transcriptome sequencing data.

Materials and methods <sec> <title>Sample collection and preservation

A 26-year-old female patient was admitted to the First Affiliated Anhui Medical Hospital in December 2016 having presented with GC-related symptoms. The patient felt uncomfortable in the upper left quadrant of her abdomen and had repeated black melena for one month. This symptom recurred and became more serious for 2 weeks. The patient felt full and uncomfortable in the upper left quadrant of her abdomen repeatedly following a meal, one month prior to the development of black melena. She also had nausea, although there was no hematemesis or hematochezia. The abdominal discomfort worsened and the patient received treatment in Luan Jinkai Hospital. Thereafter, the patient visited the First Affiliated Hospital of Anhui Medical University where the present study was conducted. The gastroscopy of the patient indicated GC with obstruction, while pathology indicated antrum considered as carcinoma mucocellulare. Pathological examination of the samples was performed using hematoxylin and eosin staining. Briefly, tissues were sliced with a dimension of 1.5×1.5×0.2-0.3 cm and soaked in 40°C warm water. The sections were deparaffinized in xylene and were gradually hydrated through graded ethanol (100, 95, 80 and 70%) each for 3 min at 25°C. The sections were stained in hematoxylin solution for 10 min at 25°C and differentiated in 1% hydrochloric alcohol. Then the slides were rinsed with tap water and distilled water until the nuclei became blue, and dehydrated in 95% ethanol. The sections were counterstained in 1% eosin solution for 5 min at 25°C, washed with 70% ethanol twice and absolute ethanol, and were cleared in two washes of xylene. The sections were mounted with neutral balsam and observed under a light microscope magnification, ×400 (×40 objective and ×10 ocular magnification; BX-42; Olympus). An abdominal computer tomography scan + enhancement showed that both armpits had multiple small lymph nodes and the gastric antrum had thickened walls along with multiple swollen lymph nodes in the surrounding area. The patient was diagnosed with a T3N3bM0 (advanced stage IIIC) GC using AJCC TNM staging system 7th edition (14).

The patient underwent a gastrostomy for tumor removal. Tumor tissue and adjacent normal tissue ~5-cm away from the tumor were resected and preserved in liquid nitrogen until use.

DNA and RNA extraction and library preparation

Genomic DNA was isolated from the resected tissues using a genomic DNA isolation kit (Bioo Scientific; http://www.biooscientific.com) according to the manufacturer's recommendations. Extracted DNA was quantified using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies; Thermo Fisher Scientific, Inc.) and the integrity was assessed using 1% agarose gel electrophoresis and visualized in gel imager (Tanon; http://www.biotanon.com). RNA was extracted from the tumor and adjacent normal tissue using TRIzol^® (Invitrogen; Thermo Fisher Scientific, Inc.), and eluted in RNAse-free water. RNA quantity and quality were assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies Inc.). DNA and RNA libraries were prepared according to standard protocols according to Bioo Scientific.

Whole transcriptome sequencing and pre-analysis

For whole transcriptome sequencing, raw sequencing reads were produced using an Illumina X10 sequencer (Illumina Inc.) with 150-bp paired-end reads according to the manufacturer's instructions. Raw RNA-Sequencing (RNA-Seq) data was trimmed by removing the adaptors and low quality reads using trim_galore v0.4.4 software (15). The data was subsequently aligned to the human reference genome GRCh37.p13 using STAR v2.6.1a software (16) in a 2-pass mapping. Read count in the tumor and adjacent normal tissues was calculated using htseq-count v0.10.0 software (17), and differentially expressed genes were detected using edgeR v3.24.3 software (18). Genes with P≤0.001 and log₂ fold change ≥1 were considered as differentially expressed.

Whole genome sequencing and pre-analysis

Whole genome sequencing of the prepared libraries was performed using the Illumina X10 sequencer (Illumina Inc.) with 150-bp paired-end reads according to the manufacturer's instructions. The raw fastq data was mapped to the human reference genome (b37) provided by the Broad Institute with the Burrows-Wheeler Aligner v.7.0.12 (19). Read duplications were marked using the Picard tool v.2.10.10; local realignment and base quality score recalibration was performed using Genome Analysis ToolKit (GATK) v. 3.8-0 (20) and the final bam files were used for variant calling.

Somatic and germline variants detection

Germline variants were called using the GATK HaplotyperCaller joint v3.8 software (21) and then filtered using GATK variant quality score recalibration (VQSR). Variants that passed using the VQSR module with coverage ≥6X and supporting reads ≥0 were retained, while the variants showing genotype ‘0/0’ or ‘./.’ in the tumor sample were filtered out. For somatic variant calling, three different types of software were used: GATK HaplotypeCaller joint v3.8, Mutect v1.1.4 and MuTect2 v3.8 (22). In Mutect and MuTect2, all variants that were flagged as ‘PASS’ were kept. Other variants were retained with a coverage ≥5X and supporting ≥3 reads, if they were flagged among a subset of ‘alternative (23) allele in normal’, ‘triallelic site’, ‘possible contamination’, ‘clustered read position’ in Mutect, or ‘germline risk’, ‘alt allele in normal’, ‘t lod fstar’ (tumor does not meet likelihood threshold), ‘str contraction’ (site filtered due to contraction of short tandem repeat region), ‘triallelic site’ (site filtered because > two alternate alleles pass tumor likelihood threshold) in MuTect2. Germline variants were removed from both the Mutect and MuTect2 output data. In all three softwares, variant allele frequency in the tumor sample had to be 3 times higher compared with that in the normal sample, and >5% of the variant reads in the tumor sample was required. Functional annotations for both germline and somatic variants were added to each mutation using the ANNOVAR software v.2018Apr16 (17) and several publicly available databases, such as 1,000 Genome Project (24), the Exome Aggregation Consortium (25), the Genome Aggregation Database (26) and the compiled scores prediction system dbnsfp33a (27), dbSNP (28) and COSMIC (29).

Transition to transversion ratio calculation

The transition to transversion ratio for germline variants was calculated by dividing total transitions by total transversions. Transitions refer to variations that involve a change from purine to purine, while transversions refer to variations that involve a change from purine to pyrimidine or vice versa (30). The transition to transversion ratio between homologous strands of DNA is generally ~2, and for human, the ratio is ~2.1 (31).

Copy number alternation and structural variation analysis

Copy number variants (CNV), tumor purity and ploidy was analyzed using cnv_facets software (https://github.com/dariober/cnv_facets) based on FACETS v0.5.14 (32), and run with the ‘-cval 25 400’ command. Duplicated and deleted segments were retained and annotated using an ENCODE gene symbol in R v3.5.1 (33). iCallSV (https://github.com/rhshah/iCallSV), which applied Delly v0.7.5 prediction method (34) was used to detect structural variations and fusions using default parameters. Inter- and intra-chromosomal fusions >50 kb were retained and only in-frame deletions, duplications and fusions were selected for subsequent analysis.

Mutation significance analysis

The 1,000 Genome Project (24), the Exome Aggregation Consortium (25) and the Genome Aggregation Database (26) provided alternative allele frequency data in different populations that ANNOVAR software applied to the present variations. Annotations using a set of algorithm scoring system from dbNSFP v.33a, including FATHMM, LRT, Mutation Assessor, Mutation Taster, M-CAP, Polyphen2, PROVEAN and SIFT (35), were also applied to the variations. Then the common variants were filtered out, which are sites with a minor allele frequency ≥1%, to get the most informative ones. Mutations predicted to be deleterious by at least two algorithms of dbNSFP were considered to be potential pathogenic sites of these informative variants.

Identification of mutations in cancer associated genes

To identify genes with a high possibility of involvement in cancer development, the Cancer Gene Census (CGC) (https://cancer.sanger.ac.uk/census) (36) and Cancer Predisposition Genes (CPG) (37) databases were used. In total, 850 candidate genes from cancer-associated databases (723 genes from CGC and 114 genes from CPG with 87 overlapping genes), were investigated. The gene lists were downloaded directly from the two databases with no specific selection criteria.

Pathway analysis

Gene enrichment was performed using the R package clusterProfiler v3.10.1 software (38), which implements a hypergeometric model to test for gene set overrepresentation relative to a background gene set. Enrichment was analyzed using the functions: ‘Enriched Kyoto Encyclopedia of Genes and Genomes (KEGG)’, ‘gseKEGG’ (Gene Set Enrichment Analysis (GSEA) (39) of KEGG, and a function of clusterProfiler), P<0.05, OrgDb=org.Hs.eg.db (a database for Genome Wide Annotation for Human, referenced by clusterProfiler).

Integrated analysis of whole genome and transcriptome sequencing data

The differentially expressed genes were compared with mutated genes identified by whole genome sequencing. To achieve this, genes that were mutated and with differentially expressed transcripts were merged in R v3.5.1 (33).

Analysis of public GC RNA-Seq data

The RNA-Seq data from The Cancer Genome Atlas stomach adenocarcinoma (TCGA STAD) was downloaded from TCGAbiolinks online tool (https://github.com/BioinformaticsFMRP/TCGAbiolinks). To identify the most significantly differentially expressed genes in GC, the results from three RNA-Seq differential expression analysis tools, including limma v3.42.2 (40), DEseq2 v1.26.0 (41) and edgeR v3.28.1 (18), were combined and the overlapping genes were separated for further analysis. For all three tools, P<0.05 and the average log₂ fold change ± standard deviation were used as the cut-off values. Gene Ontology (42) enrichment analysis was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 functional annotation clustering tool to determine the potential functions of the differentially expressed genes. The KEGG pathway analysis was performed to determine the involvement of differentially expressed genes in different biological pathways. The-log10 (P) denotes the enrichment score, indicating the significance of the pathway associations.

Kaplan-Meier plotter (KM-Plotter) analysis

KM-plotter (https://kmplot.com/analysis/) is an online database and tool which can be used for the discovery and validation of survival biomarkers (43). It contains cancer gene expression data and survival information from Gene Expression Omnibus (44), European Genome-phenome Archive (EGA) (https://www.ebi.ac.uk/ega/home), and TCGA (https://www.cancer.gov/tcga), and uses each percentile of mRNA expression between the lower (25%) and upper quartiles (75%) of expression as a cut-off point to divide patients into high and low expression groups. The KM-plotter software, which included available transcriptome and survival data (both overall and disease-free survival times), was used to analyze a total of 1,440 patients with GC, based on default parameters.

Results <sec> <title>Tumor purity and coverage statistics

The proportion and the ploidy of tumor cells in the sample were ~22% and 1.74, respectively. Using whole genome sequencing, the total number of paired reads were 8.13 and 8.34 billion, the mean read coverage was 37.16 and 35.48, while the mean library insert size was 319 and 323 bp for normal and tumor tissue samples, respectively. For the consensus coding sequence region, ~90% of the bases had over 30X coverage. Using whole transcriptome sequencing, 1.04 and 1.11 billion clean reads were obtained with an average input read length of 253 and 261 bp for normal and tumor tissue, respectively.

Gene expression analysis

The RNA-Seq pipeline was used (Fig. S1A) for the whole transcriptome sequencing. Genes with P≤0.001 and absolute log₂ fold change value ≥1 were considered as significantly differentially expressed genes. In total, 766 down- and 765 upregulated genes were obtained (Fig. S1B). The RNA-Seq result was subsequently matched to the whole genome sequencing data for further analysis.

Germline mutation analysis

Germline mutation analysis revealed a total of 4,228,339 single nucleotide polymorphisms (SNPs) and Indels with 95.03% of the sites previously reported in the dbSNP database. The transition to transversion ratio was 2.05, indicating that the germline analysis had high confidence, as the empirical value for whole genome analysis is ~2.1 (31). Among the germline variants, 21,405 SNPs/Indels were located in the exonic regions, of which 10,136 were predicted to alter the protein, including 9,504 non-synonymous single nucleotide variations (SNVs), 112 frameshift deletions, 90 frameshift insertions, 173 non-frameshift deletions, 169 non-frameshift insertions, 78 stop-gain, and 10 stop-loss. More than 1% minor allele frequency variants from publicly available databases were removed, as those sites are common in the population. Of the candidate genes from the CGC and CPG databases, 13 SNPs and two Indels were identified in 15 genes (Table I). A deleterious score evaluation of the 15 variants revealed that mutations in DOCK8 (p.D63N), HLF (p.V59A), NKX2-1 (p.G322S), NRG1 (p.S17C), PRDM2 (p.S823T), WNK2 (p.C652R) were deleterious, while mutations in KMT2D (p.V401M), RAD51D (p.A181V) and ROS1 (p.E1605Q) were possibly damaging.

Somatic mutation analysis

A total of 3 softwares from the Broad Institute were used to identify the somatic variants and 114 variants in the exonic regions were found. This included 45 non-synonymous SNVs, 6 frameshift deletions, 11 frameshift insertions, 17 non-frameshift deletions, 3 non-frameshift insertions, 2 stop-gain and 30 synonymous SNVs. Similar to the germline mutation analysis, common variants were filtered out with publicly available datasets to select the most informative somatic variants. In total, 29 variants were obtained, including 13 mutations which were reported in the Catalogue of Somatic Mutations in Cancer database (Table SI). Of these, 3 protein-truncating variants (BPNT1, p.S144I; FRG1, p.N153D; and TAS2R31, p.L59F) were predicted to be deleterious and one variant (KRTAP5-3, p.C185S) was predicted to be possibly damaging. However, for transcriptional regulation, 3 genes (MUC2, MUC4 and SLC8A2) were found to be differentially expressed.

Large-scale variation analysis

To identify large-scale variations in the patient, the copy number alterations and complex structural events in the tumor sample were investigated. The structural variant discovery tool, DELLY, reported 701 structural events; however, only in-frame events or fusions over 50 kb between two genes were retained. Ultimately, 20 significant structural variations were reported (Table SII). Another sensitive tool, FACETS, detected 58 large segments, of which 62 were deletions and 112 were duplications found in the coding regions (Table SIII). Combining the results from the two tools, nine genes (CDKN1B, FBXW7, MUC4, CCND3, ETV6, FGFR2, FGFR3, TFEB and KMT2C) were reported in either the CGC or CPG databases. CDKN1B, FBXW7 and MUC4 showed duplications, and there were deletions in CCND3, ETV6, FGFR2, FGFR3 and TFEB. Although KMT2C rearranged with BAGE2, the latter was not reported in the 2 databases. The mRNA expression levels of the genes with structural variations was also investigated and 21 were found to be differentially expressed in the tumor sample (Fig. 1).

Integrated pathway analysis results

RNA-Seq and mutation data were analyzed to characterize the genomic alterations in known signaling pathways. The GO enrichment for somatic mutated genes revealed terms, which included 3 mucin family genes, MUC2, MUC4 and MUC6, and were enriched in ‘maintenance of gastrointestinal epithelium’, ‘O-glycan processing’, and digestive system process, while KRTAP5-3, TCHH, RPTN, POU3F1 were enriched in epidermal cell differentiation pathway (Table SIV). The KEGG enrichment analysis revealed SLC8A2 and SLC25A5 were enriched in calcium signaling and cGMP-PKG signaling pathways (Table SIV). From the RNA-Seq data, GSEA showed that the upregulated genes were enriched in ‘cGMP-PKG signaling’, ‘calcium signaling’, and ‘oxytocin signaling’ pathways, while the downregulated genes were enriched in ‘cell cycle’ pathway, ‘oxidative phosphorylation’ pathway, ‘transcriptional mis-regulation in cancer’ pathway and ‘GC pathway’ (Fig. 2).

Comparison with TCGA STAD GC features

To ensure the present study was representative and valuable, the genomic features of the patient was compared with data from TCGA STAD dataset, which was downloaded from TCGAbiolinks, and 329 (136 up- and 193 downregulated genes) differentially expressed genes were identified, which were also found in the RNA-Seq analysis. A total of three tools were used to identify the most significantly differentially expressed genes, and 136 up- and 193 downregulated genes were identified (Fig. S2A), and were subsequently investigated in the tumor tissue sample obtained from the patient with GC and compared with that in the adjacent normal tissue. In total, 27 genes were identified, including five that were up- and 22 that were downregulated (Fig. 3A). GO and KEGG pathway enrichment analysis revealed one (GO:MF) and three (KEGG) significant terms, respectively, including digestion and gastric acid secretion (Fig. 3B).

To determine the associated functional genomic alterations of these 27 genes, the previous studies that reported them were analyzed. In the present study, there were no somatic mutations in the 27 genes, however, 12 of the genes, including gastrokine 2 (GKN2), prosaposin-like 1 (PSAPL1), ethanolamine-phosphate phospho-lyase (ETNPPL), cytochrome P450 family 2 subfamily W member 1 (CYP2W1), SFRP4, collagen type XXII α 1 chain, prostate stem cell antigen, lipase F, gastric type, solute carrier family 15 member 1 (SLC15A1), ATPase H+/K+ transporting subunit β (ATP4B), aldehyde dehydrogenase 3 family member A1 (ALDH3A1) and cystatin SN (CST1) were mutated in germline. Notably, two (SFRP4 and SLC15A1) of the 12 germline mutated genes have been reported in previous studies. Germline mutations in SFRP4 (P320T and R340K) were reported to be strongly associated with human cancer types (45) while mutations in SLC15A1 were linked with bowel diseases (46).

The KM plotter database was subsequently used with the log-rank test to investigate the association between the expression level of the 27 genes and clinical outcomes (both disease-free and overall survival times). In the database, patients with GC are divided into high and low expression groups, with each percentile of mRNA expression between the lower (25%) and upper quartiles (75%) used as a cut-off point. A total of 10 genes (ALDH3A1, ETNPPL, activity regulated cytoskeleton associated protein (ARC), ATP4B, CST1, CYP2W1, GKN2, PSAPL1, SFRP4 and SLC15A1) with differential expression were significantly associated (P<0.05) with survival times. In all the 10 genes, those with high expression (red curve) were correlated with a poorer survival while genes with low expression (black curve) correlated with higher survival in patients with GC (Fig. S2B).

Discussion

In an attempt to improve the understanding of the pathobiology of GC in young patients and provide insights for personalized treatment strategies, the genome of a 26-year old patient with GC was performed using both whole genome and whole transcriptome sequencing. Germline and somatic mutations analyses of the whole genome sequencing data revealed 15 cancer related genes with germline mutations, 27 genes with 29 informative somatic mutations and nine genes with structural events that could underlie the progression of GC in the patient. Of the 27 genes containing somatic mutations, 26 have not been previously reported as driver genes in TCGA (42). Only MUC6, a immunohistochemical marker used to support the diagnosis of gastric-type EA (47), has been reported as a driver gene. In the mutation significance analysis, a set of scoring system was used to predict deleterious amino acid substitutions. This analysis revealed nine germline variants and four somatic mutations with potentially pathogenic sites. Compared with somatic mutations, more deleterious germline sites were detected and genes containing these sites were all found to be cancer-associated, suggesting that inheritance disorders might be a key risk factor for the development of GC in the patient.

To improve the understanding of the inter-relationships between genomic alterations and the transcriptome, the mutated genes at the transcriptional level were also investigated. It was found that genes with somatic mutations, such as MUC2, MUC4, SLC8A2, and with somatic structural mutations, such as CCND3, FGFR2 and FGFR3, were differentially expressed, suggesting that they could be promising precision therapy targets.

Mucin genes are known to maintain the gastrointestinal epithelium, and persistent mucosal inflammation increase the risk of developing GC (48). In addition, the fibroblast growth factor receptor (FGFR) pathway plays a key role in GC pathogenesis, and detection of FGFR2 copy number in the plasma circulating tumor DNA are potential predictive biomarkers to FGFR inhibition (49).

With respect to heterogeneity of GC, it was hypothesized that mutations in different genes that are involved in the same pathway could result in similar clinical phenotypes. Therefore, different pathways were investigated that could be dysregulated in the patient with GC. The calcium signaling pathway, cGMP-PKG signaling pathway and transcriptional mis-regulation were enriched at both the genomic and transcriptomic levels, suggesting that these pathways might play crucial roles in young patients with GC. Notably, these pathways were not found from the analysis of TCGA STAD dataset (42), suggesting that the pathogenesis of GC in young individuals, particularly in the patient in the present study may be different. Nevertheless, the genomic features of the young patient with GC was compared with TCGA STAD data and 27 differential expression genes were identified, particularly germline mutations in SFRP4 (P320T and R340K), which are strongly associated with human cancers (45). SFRP4 is a member of the SFRP family (50). SFRP4s contain a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins (50) and act as soluble modulators of Wnt signaling, which is a well-known pathway that is involved in tumorigenesis of GC (51). This indicates that such germline mutations could be informative in designing patient specific therapies. Several germline mutations were found in the 27 genes; however, no somatic alterations or structural events were identified, suggesting that genetic factors may be important in the development of early GC.

Notably, the youngest patient in TCGA STAD dataset was 30-years-old and >99% (439/443) of the patients were >40 years old, thus, TCGA STAD data could have originated from elderly patients with GC. This supports the requirement to investigate the genome in younger patients with GC to identify possible therapeutic targets. The patient in the present study was diagnosed at age 26, which was much younger compared with patients in TCGA STAD dataset. TCGA-STAD RNA-seq data was reanalyzed and 331 dysregulated genes were found (138 were up- and 193 were downregulated), however, only 27 of these were found in the patient in the present study, suggesting that there was a considerable difference between GC in young and elderly patients. There may be indications that the pathogenesis of the patient was substantially different from typical patients with GC, as cancer-related pathways, which were significantly enriched at both genomic and transcriptomic levels in the patient were also not found in TCGA STAD dataset.

In conclusion, genes with germline (SFRP4), and somatic mutations (MUC2, MUC4, SLC8A2), and those with structural variations (CCND3, FGFR2 and FGFR3), which were differentially expressed in the patient could be promising precision therapy targets.

Supplementary Material

Supporting Data

Acknowledgements

Not applicable.

Funding

This study was supported in part by the Hefei National Laboratory for Physical Sciences at Microscale (grant no. HFNLM20180012).

Availability of data and materials

We have deposited the data to National Genomics Data Center with the project number PRJCA002949. The datasets are not publicly accessible due to local data protection laws. However, access to the data can be granted upon a reasonable request to the corresponding author.

Authors' contributions

QW and KH conceived the study and designed the experiments. LL and ZH provided the tissue samples and supplied the clinical and pathological information of the patient. QR, OEA and SW performed sample preparation, DNA isolation for sequencing and library construction. KH and WY performed the bioinformatics analysis. OEA, WY, KH and QW drafted and revised the manuscript and supplementary information. All authors approved the final version of the manuscript.

Ethics approval and consent to participate

This study was approved by the Ethics Committee of the First Affiliated Hospital of Anhui Medical University. The patient provided written informed consent for the use of the resected tissue, and the sample was obtained under Institutional Review Board approval.

Patient consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Abbreviations

gastric cancer

CGC

Cancer Gene Census

CPG

Cancer Predisposition Genes

Indels

insertions and deletions

SNP

single nucleotide polymorphism

SNV

single nucleotide variation

Gene Ontology

KEGG

Kyoto Encyclopedia of Genes and Genomes

TCGA STAD

The Cancer Genome Atlas stomach adenocarcinoma

References 1

De Vita

Di Martino

Fabozzi

Laterza

Ventriglia

Savastano

Petrillo

Gambardella

Sforza

Marano

Clinical management of advanced gastric cancer: The role of new molecular drugs

World J Gastroentero2014537145582014

10.3748/wjg.v20.i40.14537

Mori

Sugimachi

Ohiwa

Okamura

Tamura

Inokuchi

Early gastric carcinoma in Japanese patients under 30 years of age

Br J Surg722892911985

10.1002/bjs.1800720415

3986480

Lim

Lee

Kim

Alteration of E-cadherin-mediated adhesion protein is common, but microsatellite instability is uncommon in young age gastric cancers

Histopathology421281362003

10.1046/j.1365-2559.2003.01546.x

12558744

Smith

Stabile

Extreme aggressiveness and lethality of gastric adenocarcinoma in the very young

Arch Surg1445065102009

10.1001/archsurg.2009.77

19528381

Quijano Orvananos

Moreno Paquentin

Alvarez

Martinez Munive

Butron Perez

Gastric carcinoma in patients under 35 years

Rev Gastroenterol Mex6475771999(In Spanish)

10532131

Theuer

Kurosaki

Taylor

Anton-Culver

Unique features of gastric carcinoma in the young: A population-based analysis

Cancer8325331998

10.1002/(SICI)1097-0142(19980701)83:1<25::AID-CNCR4>3.0.CO;2-D

9655289

Wierinckx

Roche

Raverot

Legras-Lachuer

Croze

Nazaret

Rey

Auger

Jouanneau

Chanson

Integrated genomic profiling identifies loss of chromosome 11p impacting transcriptomic activity in aggressive pituitary PRL Tumors

Brain Pathol215335432011

21251114

Cancer Genome Atlas Research Network

Bass

Thorsson

Shmulevich

Reynolds

Miller

Bernard

Hinoue

Laird

Curtis

Comprehensive molecular characterization of gastric adenocarcinoma

Nature5132022092014

10.1038/nature13480

25079317

Chmielecki

Meyerson

DNA sequencing of cancer: What Have We Learned?

Annu Rev Med6563792014

10.1146/annurev-med-060712-200152

24274178

Garraway

Lander

Lessons from the Cancer Genome

Cell15317372013

10.1016/j.cell.2013.03.002

23540688

Vogelstein

Papadopoulos

Velculescu

Zhou

Diaz

Kinzler

Cancer genome landscapes

Science339154615582013

10.1126/science.1235122

23539594

Garraway

Genomics-driven oncology: Framework for an emerging paradigm

J Clin Oncol31180618142013

10.1200/JCO.2012.46.8934

23589557

Chen

Abstract 4495: Development and clinical application of an integrative genomic approach to personalized cancer therapy

Cancer Res762016doi: 10.1158/1538-7445.AM2016-4495

Edge

Compton

The American Joint Committee on Cancer: The 7th edition of the AJCC cancer staging manual and the future of TNM

Ann Surg Oncol17147114742010

10.1245/s10434-010-0985-4

20180029

Krueger

Trim Galore: A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries

2012

Dobin

Davis

Schlesinger

Drenkow

Zaleski

Jha

Batut

Chaisson

Gingeras

STAR: Ultrafast universal RNA-seq aligner

Bioinformatics2915212013

10.1093/bioinformatics/bts635

23104886

Anders

Pyl

Huber

HTSeq-a Python framework to work with high-throughput sequencing data

Bioinformatics311661692015

10.1093/bioinformatics/btu638

25260700

Robinson

McCarthy

Smyth

edgeR: A Bioconductor package for differential expression analysis of digital gene expression data

Bioinformatics261391402010

10.1093/bioinformatics/btp616

19910308

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM

(Submitted on 16 Mar 2013 (v1), last revised 26 May, 2013 (this version, v2))

DePristo

Banks

Poplin

Garimella

Maguire

Hartl

Philippakis

del Angel

Rivas

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Nat Genet434914982011

10.1038/ng.806

21478889

Poplin

Ruano-Rubio

DePristo

Fennell

Carneiro

Van der Auwera

Kling

Gauthier

Levy-Moonshine

Roazen

Scaling accurate genetic variant discovery to tens of thousands of samples

BioRxivJuly242017doi: https://doi.org/10.1101/201178

Cibulskis

Lawrence

Carter

Sivachenko

Jaffe

Sougnez

Gabriel

Meyerson

Lander

Getz

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Nat Biotechnol312132192013

10.1038/nbt.2514

23396013

Colaprico

Silva

Olsen

Garofano

Cava

Garolini

Sabedot

Malta

Pagnotta

Castiglioni

TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data

Nucleic Acids Res44e712016

10.1093/nar/gkv1507

26704973

1000 Genomes Project Consortium

Auton

Brooks

Durbin

Garrison

Kang

Korbel

Marchini

McCarthy

McVean

Abecasis

A global reference for human genetic variation

Nature52668742015

10.1038/nature15393

26432245

Karczewski

Weisburd

Thomas

Solomonson

Ruderfer

Kavanagh

Hamamsy

Lek

Samocha

Cummings

The ExAC browser: Displaying reference data information from over 60 000 exomes

Nucleic Acids Res45D840D8452016

10.1093/nar/gkw971

27899611

Karczewski

The genome aggregation database (gnomAD)

2017

Liu

Boerwinkle

dbNSFP v3.0: A One-stop database of functional predictions and annotations for human nonsynonymous and splice-site SNVs

Hum Mutat372352412016

10.1002/humu.22932

26555599

Sherry

Ward

Kholodov

Baker

Phan

Smigielski

Sirotkin

dbSNP: The NCBI database of genetic variation

Nucleic Acids Res293083112001

10.1093/nar/29.1.308

11125122

Tate

Bamford

Jubb

Sondka

Beare

Bindal

Boutselakis

Cole

Creatore

Dawson

COSMIC: The catalogue of somatic mutations in cancer

Nucleic Acids Res47D941D9472019

10.1093/nar/gky1015

30371878

Kimura

A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide-sequences

J Mol Evol161111201980

10.1007/BF01731581

7463489

Mwenifumbo

Marra

Cancer genome-sequencing study design

Nat Rev Genet143213322013

10.1038/nrg3445

23594910

Shen

Seshan

FACETS: Allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing

Nucleic Acids Res44e1312016

10.1093/nar/gkw520

27270079

Team

R: A language and environment for statistical computing

2013

Rausch

Zichner

Schlattl

Stutz

Benes

Korbel

DELLY: Structural variant discovery by integrated paired-end and split-read analysis

Bioinformatics28i333i3392012

10.1093/bioinformatics/bts378

22962449

Dong

Wei

Jian

Gibbs

Boerwinkle

Wang

Liu

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies

Hum Mol Genet24212521372015

10.1093/hmg/ddu733

25552646

Futreal

Coin

Marshall

Down

Hubbard

Wooster

Rahman

Stratton

A census of human cancer genes

Nat Rev Cancer41771832004

10.1038/nrc1299

14993899

Rahman

Realizing the promise of cancer predisposition genes

Nature5053023082014

10.1038/nature12981

24429628

Wang

Han

clusterProfiler: An R package for comparing biological themes among gene clusters

OMICS162842872012

10.1089/omi.2011.0118

22455463

Subramanian

Tamayo

Mootha

Mukherjee

Ebert

Gillette

Paulovich

Pomeroy

Golub

Lander

Mesirov

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Proc Natl Acad Sci USA10215545155502005

10.1073/pnas.0506580102

16199517

Ritchie

Phipson

Law

Shi

Smyth

limma powers differential expression analyses for RNA-sequencing and microarray studies

Nucleic Acids Res43e472015

10.1093/nar/gkv007

25605792

Love

Huber

Anders

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Genome Biol155502014

10.1186/s13059-014-0550-8

25516281

Bailey

Tokheim

Porta-Pardo

Sengupta

Bertrand

Weerasinghe

Colaprico

Wendl

Kim

Reardon

Comprehensive characterization of cancer driver genes and mutations

Cell174103410352018

10.1016/j.cell.2018.07.034

30096302

Szász

Lánczky

Nagy

Förster

Hark

Green

Boussioutas

Busuttil

Szabó

Győrffy

Cross-validation of survival associated biomarkers in gastric cancer using transcriptomic data of 1,065 patients

Oncotarget749322493332016

10.18632/oncotarget.10337

27384994

Taiaroa

Rawlinson

Featherstone

Direct RNA sequencing and early evolution of SARS-CoV-2

bioRxiv2020(Epub ahead of print)

Chen

Zhao

Wang

Yang

Kung

Multiple gene dysfunctions lead to high cancer-susceptibility: Evidences from a whole-exome sequencing study

Am J Cancer Res15625732011

21984973

Zucchelli

Torkvist

Bresso

Halfvarson

Hellquist

Anedda

Assadi

Lindgren

Svanfeldt

Janson

PepT1 oligopeptide transporter (SLC15A1) gene polymorphism in inflammatory bowel disease

Inflamm Bowel Dis15156215692009

10.1002/ibd.20963

19462432

Hodgson

Parra-Herran

Mirkovic

Immunohistochemical expression of HIK1083 and MUC6 in endometrial carcinomas

Histopathology755525582019

10.1111/his.13887

31021421

Huang

Qiu

Sun

Zhang

Yuan

Helicobacter pylori infection synergizes with three inflammation-related genetic variants in the GWASs to increase risk of gastric cancer in a Chinese population

PLoS One8e749762013

10.1371/journal.pone.0074976

24069371

Hierro

Alsina

Sanchez

Serra

Rodon

Tabernero

Targeting the fibroblast growth factor receptor 2 in gastric cancer: Promise or pitfall?

Ann Oncol28120712162017

10.1093/annonc/mdx081

28327938

Cruciat

Niehrs

Secreted and transmembrane wnt inhibitors and activators

Cold Spring Harb Perspect Biol5a0150812013

10.1101/cshperspect.a015081

23085770

Herr

Hausmann

Basler

WNT secretion and signalling in human disease

Trends Mol Med184834932012

10.1016/j.molmed.2012.06.008

22796206

Figure 1.

Differentially expressed genes with structural alternations. Structural variation and copy number alternation analysis identified 21 genes, which were differentially expressed. The angular axis shows log₂ fold change between tumor and adjacent normal tissues.

Figure 2.

Gene Set Enrichment Analysis revealed dysregulated pathways in RNA-seq. Upregulated genes are indicated by blue and the downregulated by red.

Figure 3.

Genomic features in the young patient in the present study compared with that in patients from TCGA STAD dataset. (A) Heatmap of the 27 genes, which were dysregulated in both TCGA STAD and in the present study. Counts per million (CPM) indicates the expression level of each gene and log (CPM+1), which ranged from −2 (blue) to 6 (red), was used for sample clustering and visualization. (B). Gene Ontology enrichment analysis and KEGG pathway analysis identified several significant pathways. *P<0.05 and **P<0.01. DEG, differentially expressed gene; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular functions; CC, cellular components; BP, biological processes; TCGA, The Cancer Genome Atlas; STAD, stomach adenocarcinoma; hydrogen:potas … ATPase activity, hydrogen:potassium-exchanging ATPase activity.

Table I.

Functional assessment of the germline mutations and prediction of their effect.

Gene	Databse	Chromosome	Position	Reference allele	Alteration	Type	AA. Change	Cosmic82	Final prediction
AKAP9	CGC	7	91739445	C	T	Non-SNV	p.T3899I	N	U
DOCK8	CPG	9	286491	G	A	Non-SNV	p.D63N	Y	D
HIF1A	CGC	14	62207747	C	A	Non-SNV	p.T669N	N	U
HLF	CGC	17	53345172	T	C	Non-SNV	p.V59A	N	D
KMT2D	CGC	12	49446404	C	T	Non-SNV	p.V401M	N	P
MLLT3	CGC	9	20414344	CTG	–	Non-FS DEL	p.163_164del	Y	U
MN1	CGC	22	28194933	–	TGCTGC	Non-FS	p.Q533delins	N	U
					TGCTGC	INS	QQQQQ
NKX2-1	CGC	14	36986635	C	T	Non-SNV	p.G322S	N	D
NRG1	CGC	8	32505286	C	G	Non-SNV	p.S17C	N	D
PRDM2	CGC	1	14107360	T	A	Non-SNV	p.S823T	Y	D
PTPRC	CGC	1	198675962	A	G	Non-SNV	p.N101S	N	U
RAD51D	CPG	17	33428245	G	A	Non-SNV	p.A181V	N	P
RNF213	CGC	17	78346531	C	A	Non-SNV	p.P4250T	N	U
ROS1	CGC	6	117662652	C	G	Non-SNV	p.E1605Q	N	P
WNK2	CGC	9	96015284	T	C	Non-SNV	p.C652R	N	D

Non-SNV, non-synonymous single nucleotide variation; non-FS DEL, non-frameshift deletion; non-FS INS, non-frameshift insertion; CGC, Cancer Gene Census; CPG, Cancer Predisposition Genes; P, possibly damaging; D, deleterious; U, uncertain significance; Y, yes; N, no; AA, amino acid.