A study exploring critical pathways in clear cell renal cell carcinoma

Renal cell carcinoma (RCC) is the most lethal type of cancer in the urinary system and often presents as a metastatic disease. Furthermore, there are no effective treatments for the disease. Several studies based on gene expression profiling have been performed with the aim of gaining insights into the pathogenesis of RCC; however, few studies have investigated RCC at the pathway level to search for the possible pathways involved in clear cell RCC (CCRCC). In this study, gene set enrichment analysis (GSEA) was conducted on microarray datasets from CCRCC tissue. DAVID functional enrichment analysis was performed based on the dysregulated genes that were identified in a meta-analysis performed on the microarray datasets from CCRCC tissue. In GSEA, 17 down- and 12 upregulated pathways coexisted in six datasets. The majority of the upregulated pathways were associated with the immune system. In addition, 32 dysregulated pathways were obtained from DAVID functional enrichment analysis, based on the abnormal genes identified by meta-analysis. This study demonstrated that cross-GSEA is a useful method for exploring the critical pathways involved CCRCC; however, an individual dataset with a small sample may introduce bias. A cross-GSEA based on certain well-designed datasets may be required to further the progress made in this study, following the analysis of its results.


Introduction
Renal cell carcinoma (RCC) is the most lethal type of cancer in the urinary system. It is a heterogeneous group of malignancies, and clear cell, papillary and chromophobe RCC are the major subtypes (1). In addition, clear cell RCC (CCRCC) is the most common type of RCC, constituting >80% of cases, and accounts for 2-3% of all adult malignant neoplasms. There were 58,240 new diagnoses of RCC or cancer of the renal pelvis and 13,040 mortalities due to the disease in 2010 in the United States. Furthermore, RCC accounted for 3% of all malignant tumors in adults and 85% of all primary malignant kidney tumors (2). RCC often presents as a metastatic disease and there are no effective treatments for it. This cancer exhibits resistance to chemotherapy and radiation and <10% of patients suffering from the metastatic disease survive five years subsequent to diagnosis. At present, there are no clinically validated adjuvant therapies for patients with CCRCC who are at a high risk of relapse following surgical resection (3). Overall, 40% of patients with RCC succumb to this disease (4). Certain molecular mechanisms have been revealed in RCC, such as von Hipple-Lindau (VHL) gene mutation. This appears to be responsible for ~60% of the cases of sporadic CCRCC (5). In hereditary papillary renal carcinoma, an oncogene, mesenchymal-epithelial transition factor (MET), has been revealed to be mutated, although the incidence of c-MET mutations is low in sporadic papillary RCC. However, to date, no theory has been able to explain all the cases of CCRCC, and there are no effective tumor markers that may be used for diagnostic, prognostic and treatment purposes in CCRCC. To develop novel therapeutic interventions for CCRCC an improved understanding of the pathogenesis is required, since traditional single-factor analysis for tumor pathogenesis is not effective. In order to obtain an enhanced understanding of the potential 'cross-talks' between dysregulated genes and proteins, there has been particular focus on undertaking a more 'global' analysis (6). Gene set enrichment analysis (GSEA), based on gene expression profiling, has been performed in CCRCC and has identified certain dysregulated gene sets in patients with G1 and G3 RCC (7).
At present, gene expression profiling between tumor tissue and normal tissue at the RNA level has been used in a previous A study exploring critical pathways in clear cell renal cell carcinoma study of RCC (6). A number of differentially expressed genes have been identified in these studies; however, few studies have been performed based on pathway analysis. While it is not difficult to obtain the gene expression profiles, the extraction of biological insight from such information remains a major challenge (8). Genome analysis provides a large quantity of information concerning molecules that are involved in disease pathogenesis and may be used to interpret diseases. Pathway analysis is a frequently used analytical tool. GSEA, which was introduced by Mootha et al (9), is well known and extensively used in gene set analysis. GSEA is a computational method that identifies pre-defined gene sets that exhibit significant differences in expression between samples from normal individuals and patients. Following its introduction, the methodology of GSEA was subsequently improved by Subramanian et al (8). This modified approach aimed to identify the significant differences in the expression of groups or pathways rather than individual genes. In view of the slight nature of the functions of individual genes, GSEA is more efficient than individual gene analysis for the study of complex diseases (10). Based on the study of a group of associated genes, investigators are likely to have an improved chance of determining the pivotal functional processes responsible for biological phenomena. GSEA in a single dataset is unlikely to identify the significant pathways and there is likely to be a poor concordance. In order to obtain improved data convergence and provide a systematic insight into the pathways altered during CCRCC pathogenesis, standardized microarray preprocessing, GSEA and DAVID functional enrichment analysis were performed in the present study.

Materials and methods
Selection and obtainment of datasets. The Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (http://www.ebi.ac.uk/arrayexpress/) were searched for gene expression profiling datasets associated with CCRCC, using the keywords 'renal cell carcinoma' up to March 25, 2013. The inclusion criteria for the datasets were as follows: i) The dataset was genome-wide; ii) samples included patients with CCRCC and normal controls; iii) complete microarray raw or normalized datasets were available. In addition, seven public gene expression datasets that concurred with our inclusion criteria were included in our study to assess the transcripts of CCRCC on a genome-wide basis. In datasets GSE15641 and GSE11151, there were other types of pathological samples besides CCRCC and normal tissue; in these instances, only the CCRCC and normal tissue samples were retained. In dataset GSE16449, samples from GSM413237 to GSM413270 had the same probe number of 33998; however, the other 35 samples had a different identical probe number and different data precessing, and were excluded. In dataset GSE12606, the samples comprised three CCRCC tissues, including autologous normal tissue and autologous metastatic tissue; the autologous metastatic samples were excluded. Two platforms, HG-U133A and HG-U133B, were applied in datasets GSE781 and GSE6344, however, only the expression profiles in HG-U133A were included in the present study. The basic information with regard to these datasets, such as the microarray platform, experimental design, probe numbers and sample size, are listed in Table I.
Standardized microarray preprocessing of the datasets. The datasets were processed with an improved software package based on version 2.4.0 of Bioconductor (18) and R version 2.10.1 (19). Dataset GSE16449 comprised normalized data and was retained. Each Affymetrix dataset was computed using the Robust Multichip Averaging (RMA) algorithm in the Affy package (20,21). Genes were excluded from further analysis if it was not possible to map them to any Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (http://www.genome. jp/kegg/pathway.html). Interquartile ranges (IQRs) were used to describe the level of variability and the genes with an IQR value of <0.5 were excluded. The probe-set with the largest variability was retained when several probe-sets hit one gene.
GSEA for pathways. GSEA was performed separately in each dataset in the Category package (version 2.10.1) (19). Pathways were excluded if <10 genes were identified to be dysregulated. The t-statistic mean of the genes was computed in each pathway. Using a permutation test with 1,000 times, the significantly changed pathways were identified with a P-value of ≤0.05.

Meta-analysis and DAVID functional enrichment analysis.
A Student's t-test comparing the CCRCC and normal samples for every probe was performed independently in all of the six Affymetrix datasets using Excel 2007, and probes with a P-value <0.05 were considered to be significantly different. In all included datasets, with the exception of dataset GSE16449 with the Agilent platform, other datasets were checked with the Affymetrix platform. In order to keep the data uniform, we only do meta-analysis on the datasets with Affymetrix platform. Following the t-test, the χ 2 square value of each selected probe was calculated using the formula: χ 2 =∑(-2 log e p i ) (22). The probes with χ 2 values <0.05 were retained and used to obtain KEGG pathways from DAVID Bioinformatics Resources 6.7 (http://david.abcc.ncifcrf.gov/).

Results
Re-analysis of each dataset to obtain the dysregulated pathways. The seven datasets included 110 CCRCC and 73 normal samples. The common method of GSEA was implemented for all of the seven datasets, respectively. In dataset GSE12606, GSEA did not identify any dysregulated pathways. This result may have been due to the small sample size of GSE12606 (three cases and three controls). For the remaining six datasets, significantly abnormal pathways were identified using GSEA, based on the permutation P-values. The results are described in Table II.
Common pathways identified by GSEA. The results of the GSEA were compared and it was revealed that 17 downand 12 upregulated pathways coexisted in six datasets. The majority of the downregulated pathways were associated with carbohydrate, lipid and amino acid metabolism, while others were involved in organismal systems or cellular processes. The names of these pathways are listed in Table III. A comparison of the dysregulated pathways, including the upand downregulated pathways, revealed that certain pathways were upregulated in one or more datasets while being downregulated in another or others.
Dysregulated pathways identified by DAVID functional enrichment analysis. The χ 2 value for 3,149 probes <0.05 was calculated using the following formula: χ 2 =∑(-2 log e p i ). The IDs of these probes were all used in the DAVID enrichment analysis. A total of 1,023 probes (genes) led to further pathway analysis and 32 dysregulated pathways were identified. The abnormal pathways and the genes involved are listed in Table IV.

Discussion
CCRCC is a frequently occurring type of cancer that presents numerous challenges. Although a number of preceding studies have attempted to identify the pathogenesis and mechanisms behind this type of cancer, none of the existing theories are able to explain all the cases of CCRCC. Genome-wide expression chips are powerful tools that enable the comprehensive identification of abnormal gene families or pathways present in relevant disease states. Thus, biologically relevant inferences may be reproducible in different studies. For single-gene analysis, examining the same biological condition using different statistical methods and datasets may lead to significant discrepancies (23). Compared with gene expression obtained from different datasets, pathway analysis applied to different datasets yields consistent results and diminishes large discrepancies. Therefore, pathway analysis is able to highlight genes weakly connected to the phenotype that may be difficult to detect in classical univariate statistics.
In our study, GSEA was performed using seven independent publicly available gene expression datasets and the six of these datasets that utilized the Affymetrix platform underwent a meta-analysis and pathway analysis using DAVID functional enrichment analysis. A cross-study based on GSEA was performed to identify critical pathways and to obtain a deeper understand of the common mechanisms involved in CRCC. The results suggested that the majority of the dysregulated pathways were consistent in different studies. Many of the dysregulated pathways identified in the present study have already been indicated to be involved in CCRCC or other types of cancer. In the following paragraphs, a number of the dysregulated pathways and hypotheses regarding the impor- Table I. Datasets information included in this study.  tance of these pathways in CCRCC, based on the functional classification, are discussed. As shown in Table III, a number of common downregulated pathways were identified by GSEA, including the citrate cycle (TCA cycle) and its associated pathways. Previous studies have demonstrated that the metabolism of a variety of nutrients is dysregulated in certain types of RCC. VHL, MET, folliculin (FLCN), tuberous sclerosis 1 (TSC1), TSC2, fumarate hydratase (FH) and succinate dehydrogenase (SDH) are known as renal cancer genes, and all are involved in pathways that respond to metabolic stress or nutrient stimulation. Individuals who harbor mutations in any of these genes have an increased risk of developing RCC. It has been suggested that RCC may be regarded as a metabolic disease (24). When the current study focused on CCRCC, the TCA cycle pathway and a number of the pathways associated with it were identified as being downregulated. FH and SDH genes (SDHA, SDHC and SDHD) were also identified as having a low expression levels.

Number of samples First author
The FH and SDH genes encode mitochondrial tricarboxylic acid cycle enzymes that are essential in energy metabolism. With deficient FH and SDH expression, the process of the TCA cycle is inhibited and oxidative phosphorylation is hindered. Previous studies have shown that certain FH-deficient kidney tumor cell lines exhibit significantly impaired oxidative phosphorylation (25,26) and are glucose-dependent (25)(26)(27). It has been demonstrated that the glucose-mediated generation of cellular reactive oxygen species in an FH-deficient kidney cancer cell line results in the stabilization of hypoxia-inducible factor (HiF) 1-α (27).
Similar to FH, mutations in three of the four SDH genes (SDHB, SDHC and SDHD) have been shown to be involved in familial paraganglioma and familial pheochromocytoma (28,29). Deficient expression of SDH genes severely impairs oxidative phosphorylation and results in SDH-deficient kidney tumors showing glucose dependence. SDH-deficient kidney tumors are likely to be as sensitive to     glucose as FH-deficient tumors and targeting the vasculature or glucose transport in these tumors may provide a novel approach to disrupt the fundamental metabolic machinery of these aggressive cancers. Mutations of FH and SDH cause the inactivation of prolyl hydroxylase (PHD), resulting in the stabilization of HiFs and driving the transcriptional activation of genes supporting tumor growth, neovascularization, invasion and metastasis. In the present study, FH and SDH were revealed to have a low expression level in CCRCC. Therefore, we propose that the downregulation of oxidative phosphorylation, the TCA cycle pathway and its associated pathways may have a critical role in the generation and development of CCRCC. The abnormal metabolism of the upstream nutrients of the TCA cycle may therefore be attributed to the blocked TCA cycle. The tight junction pathway and its associated genes are also indicated to be important in CCRCC. Tight junctions are composed of occludin, claudins and junctional adhesion molecules. Together with adherens junctions and desmosomes, they form the apical junctional complex in epithelial and endothelial cellular sheets. Tight junctions are essential for the tight sealing of the cellular sheets, thus controlling paracellular ion flux and maintaining tissue homeostasis (30). Due to the fact that tight junction proteins are able to recruit signaling proteins (31), these junctions have been assumed to be involved in the regulation of proliferation, differentiation and other cellular functions. There is a generally accepted view that tumorigenesis is accompanied by a disruption of tight junctions. Furthermore, the disruption of tight junctions may be important in the loss of cohesion, invasiveness and lack of differentiation observed in cancer cells. Consistent with this idea, several previous studies have revealed that the expression of occludin and claudins are altered in a number of types of cancer. Occludin was shown to be frequently downregulated in gastrointestinal tumors (32) and a decreased expression of claudin-1 was observed in breast (33,34) and colon cancer (35). The expression pattern of claudins is tissue-specific and it has been demonstrated that claudin-3 (transcript of CLDN3) and/or claudin-4 (transcript of CLDN4 ) are expressed in normal renal tissue (36). To the best of our knowledge, there have been no studies investigating the differential expression of CLDN3 and CLDN4 in normal renal and RCC tissues. When CCRCC tissues were compared with normal renal tissues using GSEA, the tight junction pathway was revealed to be downregulated in CCRCC tissues in different studies. Furthermore, CLDN3, CLDN4, CLDN10, junctional adhesion molecule 2 (JAM2) and JAM3, whose transcripts contribute to the formation of tight junctions, showed decreased expression in CCRCC tissue. Following the review of previous studies, Morin (37) considered that it was possible that the loss of claudins and other tight junction proteins in cancer was responsible for the loss of cell adhesion, an important step in the progression of cancer to metastasis (37). Therefore, the destruction of tight junctions may be important in the development of CCRCC.
A further factor that was considered to be important in the pathogenesis of CRCC was the downregulation of retinol metabolism and its associated genes. It is generally recognized that retinol and its metabolites have significant effects in cell differentiation and apoptosis. In the present study, GSEA revealed that retinol metabolism was downregulated when Table IV  comparing CCRCC tissues with normal renal tissues, and a number of genes included in this pathway were observed to have a low level of expression, such as alcohol dehydrogenase 1A (ADH1A), ADH1B, ADH1C and cytochrome P450s (CYPs). The inhibition of retinol metabolism results in a lack of its metabolite and, in this case, the blockage of the differentiation and apoptosis of cells. A number of the metabolites are currently widely used in the treatment of several types of cancer. The most notable is the use of all-trans retinoic acid in the treatment of acute promyelocytic leukemia (38).
Furthermore, several studies have demonstrated that one of the metabolites of retinol, 13-cis-retinoic acid, showed improved efficacy in the treatment of RCC when combined with other antitumor drugs (39)(40)(41)(42)(43)(44). The results of the current study further indicated that the dysregulated metabolism of retinol may be important in CCRCC. In order to further reveal the role of downregulated retinol metabolism in CCRCC, additional studies focusing on these abnormal genes and the affected metabolites are required. The determination of whether retinol or its metabolites exert therapeutic effects in the treatment of CCRCC may be of benefit.
In addition to downregulated pathways, GSEA also identified a number of pathways that were commonly upregulated. Appearing as solid malignant neoplasms in the human body, RCC has been demonstrated to be immunogenic. A number of immune cells have been isolated from RCC, including natural killer (NK) cells, cytotoxic T cells with specificity for autologous tumor cells, helper T cells and dendritic cells that express interleukin (IL)-1 and IL-2 and function as antigen-presenting cells (45)(46)(47)(48). When RCC appears as an antigen in the human body, immune activity is induced, leading to a series of enhanced immunization activities. Nine pathways, which were associated with the immune system or immune system diseases and which co-existed in six independent studies, were identified to be upregulated when we compared CCRCC tissue with normal renal tissue using GSEA. Antigen processing and presentation, cytokine-cytokine receptor interaction and its associated pathways, including the chemokine signaling pathway, and the Jak-STAT signaling pathway were revealed to be upregulated in the present study.
When focusing on the chemokine signaling pathway, it was revealed that many genes involved in this pathway were overexpressed. Consistent with a number of previous studies (49,50), CXCR4, a gene with a transcript known as fusin, and its ligand gene CXCL12, whose transcript is known as stromal cell-derived factor-1α (SDF-1α), were demonstrated to be overexpressed in CCRCC in our study. CXCR4 is considered to be the major chemokine receptor expressed on cancer cells (51-53) and its expression is required for tumor metastasis to other organs; CXCR4 activation by CXCL12 induces the migration of neoplastic cells (54). The strong expression of CXCR4 has been indicated to correlate with the metastasis of RCC (55). The expression of CXCR4 and SDF-1α is negatively regulated by VHL (49), i.e. a deficiency in VHL expression results in the strong expression of CXCR4 and CXCL12. It has been proposed that blocking signaling through this chemokine receptor may inhibit micrometastases in RCC and result in improved outcomes (56). Therefore, the in-depth study of the adjustment mechanisms of CXCR4 expression may be beneficial in the development of novel treatments for metastatic RCC.
In mammals, the Jak-STAT pathway is the principal signaling mechanism for numerous cytokines and growth factors. These signals induce proliferation or cell fate determination and are crucial to the proper growth and development of mammalian tissues. Too strong or too weak activity of this signaling pathway leads to severe consequences. One of these cytokines, interferon-α (IFN-α) is widely used in the therapy of several neoplasms, including RCC. IFN-α exerts its effects by activating the Jak-STAT signaling pathway and subsequently inducing target gene transcription (57). Shang et al (58) demonstrated that the defective ability of Jak-STAT in RCC may result in IFN-α resistance (58). In the present study, the Jak-STAT signaling pathway was shown to be upregulated in CCRCC tissue. This was inconsistent with the fact that a substantial proportion of patients with RCC fail to respond to IFN-α treatment. The mechanisms of IFN resistance in RCC have not yet been fully elucidated. Further study for an in-depth understanding of this mechanism may be beneficial for the development of novel treatments.
NK cells are one type of infiltrating immune cell present in RCC and the NK cell-mediated cytotoxicity pathway was shown to exhibit strong activity in CCRCC in the present study. Enhancing CCRCC cell sensitivity to the cytotoxicity of NK cells may be an efficient treatment for CCRCC. Several studies concerning new treatments for RCC based on the toxic effects of NK cells have been performed (59,60).
As malignant neoplasms appear in the human body, CCRCC tissue shows as a self-antigen and induces an autoimmune response. Human leukocyte antigens (HLAs) are key components of the major histocompatibility complex (MHC), which is involved in immune cell signaling processes. Therefore, the overexpression of HLA-B, HLA-C and a number of other HLA alleles, as well as the upregulation of certain immune system disease pathways in CCRCC, may be attributed to the self-antigenicity of CCRCC.
The present study also compared the common dysregulated pathways identified by GSEA with the dysregulated pathways obtained from DAVID enrichment analysis. The TCA cycle pathway, numerous nutrient metabolic pathways and the tight junction pathway were shown to be dysregulated in DAVID functional enrichment analysis, in addition to GSEA, based on the abnormal genes identified by meta-analysis. However,, a number of the dysregulated pathways identified using DAVID functional enrichment analysis were not consistent with the cross-GSEA. In this discussion, only the abnormal pathways co-existing in the six datasets have been regarded as significant and discussed. When we reviewed the design of every dataset included in our study, however, certain datasets had small control samples and it was difficult to identify the significant genes. This indicated that a dataset with a small sample size may lead to bias. We also analyzed the dysregulated pathways co-existing in five of the six datasets. There were 17 down-and 12 upregulated pathways co-existing in five of the CCRCC datasets. When compared with the dysregulated pathways in the meta-analysis, nitrogen metabolism, DNA replication, the peroxisome proliferator-activated receptor (PPAR) signaling pathway, nucleotide excision repair, cell cycle, cell adhesion molecules (CAMs), the Toll-like receptor signaling pathway and Vibrio cholerae infection were reduplicated. This result further indicated that the poor design of the individual dataset may lead to large bias in a cross-GSEA.
Numerous treatments have been developed based on the current understanding of RCC; however, none of the existing therapies are universally valid. The treatment of metastatic RCC remains a significant challenge. More effectual studies are required to reveal the pathogenesis and progress of RCC. The current study demonstrated that a cross-GSEA, based on a number of well-designed datasets, is likely to improve the consistency of the understanding of RCC. Additional cross-GSEAs may further the progress made in this study.
In conclusion, cross-GSEA was used to identify the critical pathways involved in RCC using gene expression datasets of CCRCC. A number of pathways that were associated with RCC or CCRCC were identified as being significant, and the understanding of RCC has been improved. The downregulation of the TCA cycle pathway, retinol metabolism and the tight junction pathway may be critical in the occurrence of CCRCC. In addition, the upregulation of the chemokine signaling pathway and its associated pathways may contribute to the metastasis of CCRCC; however, this is the focus of immunotherapy. A further cross-GSEA based on a number of well-designed datasets may provide a valuable contribution to the progress made in the current study, following the analysis of its results.