Contributed equally
Cholangiocarcinoma (CCA) is acknowledged as the second most commonly diagnosed primary liver tumor and is associated with a poor patient prognosis. The present study aimed to explore the biological functions, signaling pathways and potential prognostic biomarkers involved in CCA through transcriptomic analysis. Based on the transcriptomic dataset of CCA from The Cancer Genome Atlas (TCGA), differentially expressed protein-coding genes (DEGs) were identified. Biological function enrichment analysis, including Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, was applied. Through protein-protein interaction (PPI) network analysis, hub genes were identified and further verified using open-access datasets and qRT-PCR. Finally, a survival analysis was conducted. A total of 1,463 DEGs were distinguished, including 267 upregulated genes and 1,196 downregulated genes. For the GO analysis, the upregulated DEGs were enriched in ‘cadherin binding in cell-cell adhesion’, ‘extracellular matrix (ECM) organization’ and ‘cell-cell adherens junctions’. Correspondingly, the downregulated DEGs were enriched in the ‘oxidation-reduction process’, ‘extracellular exosomes’ and ‘blood microparticles’. In regards to the KEGG pathway analysis, the upregulated DEGs were enriched in ‘ECM-receptor interactions’, ‘focal adhesions’ and ‘small cell lung cancer’. The downregulated DEGs were enriched in ‘metabolic pathways’, ‘complement and coagulation cascades’ and ‘biosynthesis of antibiotics’. The PPI network suggested that
Cholangiocarcinoma (CCA) originates from intrahepatic or extrahepatic bile duct epithelial cells and is classified into intrahepatic CCA (iCCA), perihilar CCA (pCCA), and distal CCA (dCCA) according to the anatomic location (
The Cancer Genome Atlas (TCGA) is a publicly sponsored project with the purpose of classifying and identifying major carcinogenic genomic alterations among large cohorts of more than 30 human tumors. To perform an integrated analysis of cancer genome profiles, high-throughput technologies relying on the use of microarrays and next-generation sequencing methods were applied in TCGA (
In the present research, transcriptomic iCCA data from TCGA were utilized to identify differentially expressed protein-coding genes (DEGs) between iCCA and normal tissues. Then, we executed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis to study alterations in biological functions and signaling pathways of iCCA. PPI network construction was performed, followed by identification of hub genes. Moreover, we identified the differential expression of hub genes by analyzing transcriptomic CCA data from several open-access databases, including Gene Expression Omnibus (GEO) database and ArrayExpress Archive of Functional Genomics Data (ArrayExpress). Further, we performed quantitative polymerase chain reaction (qPCR) in the laboratory to verify these hub genes. Finally, we executed survival analysis of the identified hub genes. The objective of this study was to understand CCA carcinogenesis by exploring the genetic changes involved in disease progression and to identify potential biomarkers that may be helpful for predicting the prognosis of iCCA patients.
Data for CCA mRNA expression were downloaded from TCGA database (
GO term enrichment analysis was applied to analyze the biological significance of DEGs, which includes biological processes (BP), cellular components (CC) and molecular functions (MF), based on the GO online platform David (
Proteins usually perform biological functions synergistically. Strong relationships have been shown to exist between PPIs and the biological functions of gene/protein clusters (
CCA-related transcriptomic datasets were obtained from GEO (GSE76297 and GSE26566) and ArrayExpress (E-GEOD-32879 and E-GEOD-45001) (
To examine the classification effect of hub genes on cholangiocarcinoma and normal tissue, receiver operating characteristic (ROC) curves and area under the curve (AUC) were calculated by R package ‘pROC’ (
Tissue samples were collected as pairs, i.e., tumor tissue and adjacent normal tissue, from 10 patients with iCCA undergoing surgery at Peking Union Medical College Hospital. A total of 6 male and 4 female patients with mean age of 62 (range, 54–68) years were included. The collected tissue samples were stored in a refrigerator at −80°C. All patients were enrolled from November 2018 to April 2019. The study was approved by the Clinical Research Ethics Committee of Peking Union Medical College Hospital. Each patient provided a written informed signed consent. Total RNA was isolated from each sample with Trizol LS reagent (Invitrogen; Thermo Fisher Scientific, Inc.), and then used for cDNA synthesis using oligo(dT)primers and SuperScript™ III Reverse Transcriptase (Invitrogen; Thermo Fisher Scientific, Inc.). PCR Master Mix (2X) (Superarray) and Applied Biosystems QuantStudio5 Real-time PCR system (Thermo Fisher Scientific, Inc.) were utilized for RT-qPCR. The sequences of primers for selected hub genes and housekeeping gene (β-actin) are shown in
For hub genes validated by qRT-PCR, R package ‘ggpubr’ (
The transcriptomic dataset of CCA and the corresponding clinical cart were downloaded from the TCGA database. Patients with lesion of primary site of liver and intrahepatic bile ducts, i.e., patients with iCCA, were included for further analysis. Therefore, a total of 33 cases were acquired, including 19 female and 14 male patients. Forty-one samples were acquired in total, including 33 tumor tissue samples and 8 normal tissue samples. A total of 1,463 DEGs (log2FC >2, corrected P<0.0001) were acquired, including 267 significantly upregulated DEGs and 1,196 significantly downregulated DEGs. A heatmap and a volcano plot showing the expression levels of these genes are shown in
GO and KEGG pathway enrichment analyses were conducted to explore the functional characteristics of the DEGs. The GO analysis results revealed that the upregulated DEGs were significantly enriched in ‘extracellular matrix organization’, ‘cell-cell adhesion’, ‘cell adhesion’, ‘epithelial cell morphogenesis involved in placental branching and mitotic spindle assembly’ in terms of BP. Regarding MF, the upregulated DEGs were enriched in ‘cadherin binding involved in cell-cell adhesion’, ‘structural molecule activity’, ‘collagen binding’, ‘protein binding’ and ‘signal transducer activity’. Under CC, the upregulated DEGs were enriched in ‘cell-cell adherens junction’, ‘extracellular exosome’, ‘midbody, cell-cell junction’ and ‘cytoplasmic microtubule’. For the downregulated DEGs, significant enrichment was observed in the ‘oxidation-reduction process’, ‘xenobiotic metabolic process’, ‘metabolic process’, ‘steroid metabolic process’ and ‘platelet degranulation’ under BP. For MF, the downregulated DEGs were significantly enriched in ‘oxidoreductase activity’, ‘monooxygenase activity’, ‘iron ion binding’, ‘oxidoreductase activity acting on paired donors, with the incorporation of or reduction in molecular oxygen’ and ‘electron carrier activity’. For CC, the DEGs were enriched in ‘extracellular exosome’, ‘blood microparticle’, ‘organelle membrane’, ‘mitochondrial matrix’ and ‘extracellular region’. In addition, the KEGG analysis results showed that the upregulated DEGs were significantly enriched in ‘ECM-receptor interactions’, ‘focal adhesion’, ‘small cell lung cancer’, ‘pathways in cancer’ and ‘hypertrophic cardiomyopathy (HCM)’. Meanwhile, the downregulated DEGs were enriched in ‘metabolic pathways’, ‘complement and coagulation cascades’, ‘biosynthesis of antibiotics’, ‘retinol metabolism’ and ‘fatty acid degradation’. The enriched GO terms and KEGG pathways are shown in
PPI network analysis can be used to distinguish critical hub genes among a group of DEGs. Therefore, the STRING database was used to conduct the PPI network analysis. PPI networks for the upregulated genes were constructed by Cytoscape 3.6.1 (
Cytoscape 3.6.1 was used to perform a centrality analysis. The top 15 genes with the highest degree of connectivity were defined as hub genes. Under this criterion, 15 hub genes were obtained for the upregulated DEGs, including cyclin-dependent kinase 1 (
To further verify the differential expression of the critical hub genes in CCA, we evaluated the expression profiles of 30 hub genes in another 4 datasets. GSE76297 consists of 304 specimens in total, 183 of which were utilized in analysis, including 91 CCA tumor tissues and 92 CCA non-tumor tissues. GSE26566 consists of 169 specimens in total, 163 of which were utilized in analysis, including 104 CCA tissues and 59 surrounding liver tissues. E-GEOD-32879 consists of 37 specimens in total, 23 of which were utilized in analysis, including 16 iCCA tissues and 7 non-tumor tissues. E-GEOD-45001 consists of 10 pairs of iCCA tumor tissues and non-tumor tissues, which were all utilized in analysis. Consistent with our results, 21 out of 30 hub genes in TCGA were found to share similar differential expression among the other 4 datasets, including 8 upregulated hub genes and 13 downregulated hub genes (
ROC curves for hub genes were generated based on expression profile of TCGA dataset. AUC was >0.900 for all 21 selected hub genes (
For upregulated hub genes, the expression of
Since the survival analysis indicates that the overexpression of
Cholangiocarcinoma (CCA) is recognized as the second most commonly diagnosed primary liver tumor. Due to its strong genetic heterogeneity, the current understanding of the pathogenesis of CCA is not comprehensive. Concerning genetic changes involved in CCA initiation and progression, agreement in this field remains fragmented. The key drivers involved in CCA carcinogenesis still need to be defined (
Hub genes were identified based on the degree of connectivity. Fifteen upregulated hub genes and 15 downregulated hub genes were selected based on protein-protein interaction (PPI) network. Moreover, the expression profiles of the 30 hub genes were verified using datasets from GEO and Arrayexpress. A total of 21 hub genes showed stable differential expression among 5 datasets including TCGA. ROC curves revealed that all 21 hub genes presented a credible classification effect between tumor and normal tissue with AUC >0.900. In addition, the expression of
To further explore the relationships between hub genes and the outcomes of CCA patients, a survival analysis was conducted based on the clinical data and expression profiles of the identified hub genes in both TCGA and GSE89749. Four upregulated hub genes, including
Cyclin-dependent kinases (CDKs) are a family of protein kinases driving the major events of cell cycle control (
MK167 encodes Ki-67. It is a cell cycle-regulated phosphatase 1-binding protein universally used as a proliferation marker. Ki-67 is a major organizer required for assembly of the perichromosomal compartment in cells (
DNA topoisomerase IIα (TOP2A) is an isoform of DNA topoisomerase II (Topo II). Topo II is a crucial enzyme for cell division that generates torsional stress on double-stranded DNA by inducing transient breaks that are subsequently resealed (
Polycomb repressive complex 1 (PRC1) is required for adult stem cell functions and acts as both a tumor suppressor and oncogene (
Importantly, we noted that both CDK1 and PRC1 were involved in the same GO term, midbody. Midbody is a transient structure during cytokinesis and is involved in recruitment and organization of abscission machinery, which physically regulates the localization of two daughter cells (
Although we cannot exclude the possibility that the identified hub genes may be implicated in noncarcinogenic aspects of CCA, we attempted to ensure the credibility of the results by including as many datasets as possible. Except for TCGA, a total of 4 datasets were used to validate the differential expression of hub genes. In addition, the survival analysis of certain hub genes was based on 2 datasets. Moreover, we performed RT-qPCR to verify the selected hub genes based on 10 pairs of tissue samples.
In conclusion, we identified a number of hub genes and comprehensively revealed the biological functions and signaling pathways associated with CCA carcinogenesis through systematic bioinformatic analyses. Moreover, we identified
Not applicable.
This research was funded by the Beijing Municipal Science & Technology Commission Research Fund (no. Z171100000417004) and Beijing Natural Science Foundation (no. L172055).
Data for CCA mRNA expression were downloaded from TCGA database (
HL, JuL and YZ designed this study. HL, JuL, FX, KK, YS, WX, XW, JiL, HX, SD, YX, HZ, YZ, and JG contributed to data curation and analysis. HL and WX conducted the laboratory experiment. HL, YS and XW wrote the manuscript. JuL and YZ revised the manuscript. All the authors read and approved the final version of the manuscript for publication and agree to be accountable for all aspects of the research in ensuring that the accuracy or integrity of any part of the work are appropriately investigated and resolved.
The study was approved by the Clinical Research Ethics Committee of Peking Union Medical College Hospital. Each patient provided written signed informed consent.
Not applicable.
The authors declare that they have no competing interests.
cholangiocarcinoma
The Cancer Genome Atlas
differentially expressed protein-coding genes
Gene Ontology
Kyoto Encyclopedia of Genes and Genomes
protein-protein interaction
Gene Expression Omnibus
Heatmap and volcano plot showing significant DEGs between 33 iCCA tissues and 8 normal tissues in TCGA. (A) Rows represent genes, and columns represent samples. (B) The red spots represent significantly upregulated genes, and the green spots represent significantly downregulated genes. TCGA, The Cancer Genome Atlas; DEGs, differentially expressed protein-coding genes; CCA, cholangiocarcinoma; iCCA, intrahepatic CCA.
Functional enrichment analysis of DEGs. (A) GO cluster plot showing a chord dendrogram of the clustering of the expression spectrum of significantly upregulated genes. (B) GO cluster plot showing a circular dendrogram of the clustering of the expression spectrum of significantly downregulated genes. KEGG pathway enrichment dot plot of the (C) significantly upregulated genes and (D) downregulated genes. The y-axis represents KEGG-enriched terms. The x-axis represents the fold of enrichment. The size of the dot represents the number of genes under a specific term. The color of the dots represents the adjusted P-value. DEGs, differentially expressed protein-coding genes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.
(A) PPI network of the significantly upregulated DEGs. The nodes represent the significantly upregulated DEGs. The edges represent the interaction of significantly upregulated DEGs. The triangles represent hub genes validated by qRT-PCR. Bar chart of the (B) upregulated hub genes and the (C) downregulated hub genes. The x-axis represents count of connectivity. The y-axis represents hub gene symbols. PPI, protein-protein interaction; DEGs, differentially expressed protein-coding genes.
Dynamic expression of significantly upregulated hub genes. (A-D) Expression of hub genes from the TCGA database. (E-H) Relative quantification of hub genes based on qRT-PCR results. *P<0.05, **P<0.01, **P<0.001.
Survival analysis of significantly upregulated hub genes.