Open Access

Identification of common key genes in breast, lung and prostate cancer and exploration of their heterogeneous expression

  • Authors:
    • Richa K. Makhijani
    • Shital A. Raut
    • Hemant J. Purohit
  • View Affiliations

  • Published online on: November 30, 2017     https://doi.org/10.3892/ol.2017.7508
  • Pages: 1680-1690
  • Copyright: © Makhijani et al. This is an open access article distributed under the terms of Creative Commons Attribution License.

Metrics: HTML 0 views | PDF 0 views     Cited By (CrossRef): 0 citations

Abstract

Cancer is one of the leading causes of mortality worldwide, and in particular, breast cancer in women, prostate cancer in men, and lung cancer in both women and men. The present study aimed to identify a common set of genes which may serve as indicators of important molecular and cellular processes in breast, prostate and lung cancer. Six microarray gene expression profile datasets [GSE45827, GSE48984, GSE19804, GSE10072, GSE55945 and GSE26910 (two datasets for each cancer)] and one RNA‑Seq expression dataset (GSE62944 including all three cancer types), were downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) were identified in each individual cancer type using the LIMMA statistical package in R, and then a comparison of the resulting gene lists was performed to identify common DEGs across cancer types. This analysis was performed for microarray and RNA‑Seq datasets individually, revealing a set of 62 and 1,290 differentially expressed genes respectively, which may be associated with the three cancers. Out of these genes, 44 were common to both analyses, and hence termed key genes. Gene Ontology functional annotation, Kyoto Encyclopedia of Genes and Genomes pathway mapping and literature citations were used to confirm the role of the key genes in cancer. Finally, the heterogeneity of expression of the key genes was explored using the I2 statistic (meta package in R). The results demonstrated non‑heterogeneous expression of 6 out of the 44 key genes, whereas the remaining genes exhibited significant heterogeneity in expression across microarray samples. In conclusion, the identified DEGs may play important roles in the pathogenesis of breast, prostate and lung cancer and may be used as biomarkers for the development of novel diagnostic and therapeutic strategies.

Introduction

The highest rates of cancer-related mortality are associated with breast, prostate and lung cancer, as reported by the World Health Organization (1), the World Cancer Report (2) and Cancer facts and figures (3) A plethora of cancer microarray and RNA sequencing (RNA-Seq) studies are publicly available in databases, including the Gene Expression Omnibus (GEO) (4), Array Express (5) and The Cancer Genome Atlas (TCGA; http://cancergenome.nih.gov/). Recently, simultaneous analysis and comparison of the results from microarray and RNA-Seq data has been explored (68). These studies have indicated that RNA-Seq has more benefits compared with microarray platforms, including broader dynamic range and increased specificity and sensitivity, however using the samples belonging to the same biological conditions from both the platforms produces highly correlated gene expression profiles. However, microarrays remain a popular choice amongst researchers when conducting transcriptional profiling experiments, because RNA-Seq technology is novel, more expensive, and requires extensive and complex data storage and analysis. When analysis is conducted on both platforms, strongly concordant and highly correlated results are obtained (6,7). The present study focused on microarray analysis, but additionally performed analysis on RNA-Seq data, so as to validate the significance of the results obtained. Several studies in recent years have reported meta-analysis of such data, where the analyses are performed on integrated samples from multiple microarray datasets (913). The majority of the articles focusing on meta-analysis use the following strategies: assembling published differential expressed gene (DEG) lists from experimental studies and then articulating the consistently reported DEGs (1416); or integrating multiple datasets from different microarray platforms and then executing statistical tests to discover consistently expressed DEGs (913). However, inconsistencies in the results are observed due to technical limitations, such as variance in expression measurements and differences in laboratory protocols for different microarray platforms. One major inconsistency reported in meta-signature studies is the overrepresentation of genes common to various platforms, and the underrepresentation of genes which are not common to different platforms (11). In addition, meta-analysis that uses previously published DEG lists when raw data are unavailable, has the limitation that it is difficult to assign a confidence for combined P-values and fold change measurements for each gene (14).

With a purview to improve the understanding of cancer pathogenesis, and based on the methods from the published literature, the present study applied differential gene expression analysis individually to six microarray datasets and one RNA-Seq dataset, representing three different cancer types, breast, lung, and prostate. The aim of the present study was to discover a common set of genes, which may demonstrate a significant expression pattern across these three cancer types. A common subset of DEGs was then explored by comparing the gene lists obtained from microarray and RNA-Seq analysis results. The resulting gene set was further analyzed by Gene Ontology (GO) functional annotations using GENECODIS (17), DAVID (18), Cancer Genetics Web (19), OMIM (20) and number of literature citations using TARGETgene (21). Furtemore, a meta-analysis of the combined samples was performed to identify the heterogeneity in expression of the obtained DEGs in all the six microarray datasets analyzed. This helped in observing the change in expression of the DEGs under different cancer conditions. It is an important implication that some genes always exhibit a consistent expression change, irrespective of the cancer type, whereas some genes exhibit inconsistency in expression change. This may aid oncologists in understanding the behavior of genes in cancer in terms of their heterogeneous expression.

Materials and methods

Outline of data and preprocessing

Six cancer microarray datasets and one RNA-Seq dataset were downloaded from the GEO database (www.ncbi.nlm.nih.gov/geo) (2228). The information extracted from each identified study is illustrated in Table I. The microarray analysis was restricted to datasets derived from two platforms, Affymetrix HGU-133A (GPL96) and Affymetrix HGU-133APlus2 (GPL570), which characterize probe sets with unique genes for Homo-Sapiens. The RNA-Seq dataset, GSE62944, comprises data from 24 cancer types from The Cancer Genome Atlas, and it is already processed using Rsubread R package and featureCounts() function in order to summarize the gene level expression values as integer numbers. In the present study, integer-based read counts were extracted for only the three cancer types of interest (breast, prostate and lung) out of the data matrix for 24 cancer types. The total number of samples analyzed was 454 (311 tumor samples/143 normal samples) and 2,333 (2,120 tumor samples/213 normal samples) for the microarray and RNA-Seq datasets, respectively. To ensure unregulated, unbiased, and consistent screening of the expression values from the different microarray datasets, the raw CEL files of the experiments were used. The Robust Multichip Average (RMA) technique, which performs quantile normalization, was the expression normalization technique used in the present study (29). This technique was applied to all individual raw microarray datasets in order to minimize inconsistencies due to normalization. This method of normalization was selected due to its good differential change detection, stable variance on log scale and reduced production of false positives. A comparison between different normalization methods has reported that RMA outperformed other methods in terms of specificity and sensitivity when dealing with fold change criteria in the detection of differential expression (30). The box plots of the RMA normalized intensity were plotted (data not shown), demonstrating that measurements of data were closely aligned towards a central mean, and were thus comparable.

Table I.

Characteristics of the individual datasets used in the present study.

Table I.

Characteristics of the individual datasets used in the present study.

Type of datasetType of cancerDataset identification numberPlatformNumber of probes/genesNumber of samples (tumor/normal)
MicroarrayBreastGSE45827GPL57054,675174 (163/11)
gene expression GSE48984GPL9622,28322 (13/9)
LungGSE19804GPL57054,675120 (60/60)
GSE10072GPL9622,283107 (57/50)
ProstateGSE55945GPL57054,67519 (12/7)
GSE26910GPL57054,67512 (6/6)
RNA-SeqBreastGSE62944GPL905223,3681,230 (1,118/112)
gene expressionLung squamous cell carcinoma 551 (501/50)
Prostate adenocarcinoma 552 (501/51)
Identification of potentially significant target genes

The Bioconductor Linear Model for Microarray Analysis (LIMMA) package was used (31) to calculate the differential expression of each gene in the microarray and RNA-Seq datasets included in the present study. LIMMA remains highly recommended for such analyses (32). In a previous study comparing eight microarray analysis methods [Welch's t-test, analysis of variance (ANOVA), Wilcoxon's test, significance analysis of microarrays (SAM), Randomized Variance Model (RVM), LIMMA, variance mixture (VarMixt) and structural model for variances (SMVar)], LIMMA performed the best in terms of statistical power, false-positive rate, execution time and ease of use (33). In LIMMA, fitting of a linear model to the expression data for each probe is performed and the coefficients obtained describe the design matrix. Instead of simple t-statistics, it provides results for moderated t-statistic, moderated F-statistic, and B-statistic (which demonstrates the log-odds of differential expression), by applying the Empirical Bayes method and shrinking the standard errors towards a common value. Hence, LIMMA produces stable and reproducible results even with a small number of arrays. It also has the advantages of fast computation, simultaneous error rate control across multiple contrasts and genes, and effective prioritizing of results by applying a particular cutoff for fold change. For analysis of RNA-seq data, LIMMA with voom was used (34). The fitting of the mean-variance association into the differential expression analysis as a modification of limma's empirical Bayes procedure, and then converting it into a precision weight for each individual normalized observation is termed as limma-trend and voom. The performance of this method is best even when the sequencing depths are different for each RNA-sample.

Functional annotation of DEGs

In an effort to infer the biological functions and signals involving the DEGs, GO enrichment analysis was performed. The online tool GENECODIS (http://genecodis.cnb.csic.es) was used for this purpose (17), which also provides pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The DAVID functional annotation tool was used for expounding the results of annotation (18).

Literature citations for the DEGs

To confirm that the list of DEGs obtained in the present study is associated with cancer, the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov) literature was searched to identify published reports relating these genes to cancer. The TARGETgene tool was used for this purpose (21). This tool identifies probable therapeutic targets in cancer by constructing a whole genome network using integration of heterogeneous data at the genomic and proteomic level. Upon the construction of the gene network, TARGETgene evaluates network-based parameters to detect potential therapeutic targets and displays the number of literature citations in all and individual cancer types for each gene, as reported in the NCBI database.

Meta-analysis of expression heterogeneity of DEGs

Meta-analysis can refer to either the analysis of collectively published lists of DEGs, or the integration of diverse microarray datasets to perform a novel combined differential expression analysis. The meta-analysis performed in the present study investigated the diversity in expression of DEGs in six microarray datasets, collectively, with the aim to discover whether they display inconsistent expression changes in multiple studies, or whether they display consistent changes in all the analyzed studies. This is termed as heterogeneous or non-heterogonous behavior, respectively. This statistical heterogeneity implies genuine significant difference in between study variations, rather than within study variance which may be because of chance alone. Q and I2 statistic tests remain the most widely used measures of heterogeneity for which computation modules are available in standard statistical software for meta-analysis, such as Stata and R (35). I2 statistic is preferred among all measures of heterogeneity as it is a sample size and scale-invariant measure and has finite upper bounds and precise confidence intervals (36). For each gene obtained in the DEG list, analysis of heterogeneity was performed across cancer types using the meta package in R (37). A confidence interval of 95% was selected with the degree of freedom 5. The metacont function estimates the heterogeneity statistic score I2, along with the values, Q, df, and P-value. The seven suggested steps by Ramasamy et al (38), in conducting the meta-analysis of microarray datasets, were followed.

Results

Extracting significant gene markers relative to breast, lung and prostate cancer

The LIMMA R package was used to elucidate potential gene targets by adjusting the P-values using Benjamini-Hochberg correction. Genes were termed significantly differentially expressed if the adjusted P-value was <0.05 and the fold change was >2. DEGs for each microarray dataset of lung, breast and prostate cancer, were obtained individually, with results illustrated in Table II. Since datasets belonged to two platforms, GPL570 and GPL96, the number of probes was not equal in all datasets. Probes in GPL96 are a subset of probes in GPL570. Therefore, while combining the DEGs within the same cancer type, a union (merging) of the two individual lists of DEGs was performed, to get a single list of DEGs. The main aim of the present study was to find a common subset of DEGs across the three cancer types. Hence, an intersection of the DEG lists was performed to find members of the joint subset of genes across the three cancer types. Up to this stage of the analysis, mapping of probe IDs with the corresponding gene symbols was not performed. Therefore, the number of DEGs represented the unique probe IDs. In total, 75 differentially expressed probe IDs were discovered in common between the three cancer types. Following the removal of probes with no available annotation and the removal of repeated gene symbols, a list of 62 unique gene symbols was obtained as a result of the microarray data analysis.

Table II.

Differential expression analysis results for each microarray dataset.

Table II.

Differential expression analysis results for each microarray dataset.

CancerBreastLungProstate
GEO datasetGSE45827GSE48984GSE19804GSE10072GSE26910GSE55945
PlatformGPL570GPL96GPL570GPL96GPL570GPL570
Number of probes54,67522,28354,67522,2835,46755,4675
Number of samples174221201071219
Number of7,0063,5132,02682977539
differentiallyUnion of the twoUnion of the twoUnion of the two
expressed genes9,2482,215603

A similar analysis was performed on the RNA-Seq data for the three individual cancer types. An integer-based raw gene count data matrix of breast, lung and prostate cancer samples was used with LIMMA and voom to explore the DEGs (34). The voom method estimates the mean variance relationship of the log counts, generates a precision weight for each observation and enters these into the LIMMA empirical Bayes analysis pipeline. Using this method, 1,290 genes were obtained in common across the three cancer types.

To confirm the consistency of the results obtained, genes appearing in both the microarray and RNA-Seq analysis results were identified. Following removal of all the duplicate gene symbols, a list of 44 genes was generated. The overlap of DEGs across the three cancers obtained from microarray analysis, from RNA-Seq analysis and from the combined microarray and RNA-Seq analysis is illustrated in Fig. 1A-C respectively. The complete list of the genes identified by the combined microarray and RNA-Seq analysis, along with links to their description from the cancer genetics web (19) and OMIM database (20), is depicted in Table III.

Table III.

Gene symbols of the common differentially expressed genes in breast, lung and prostate cancer.

Table III.

Gene symbols of the common differentially expressed genes in breast, lung and prostate cancer.

Gene symbolLink to gene summary
ACSS3https://www.omim.org/entry/614356?search=ACSS3&highlight=acss3
ANGPT1http://www.cancer-genetics.org/ANGPT1.htm
AOX1https://www.omim.org/entry/602841?search=AOX1&highlight=aox1
BIRC5http://www.cancer-genetics.org/BIRC5.htm
CAV1http://www.cancer-genetics.org/CAV1.htm
CAV2http://www.cancer-genetics.org/CAV2.htm
CCDC69http://www.genecards.org/cgi-bin/carddisp.pl?gene=CCDC69
CCDC85Ahttp://www.genecards.org/cgi-bin/carddisp.pl?gene=CCDC85A&keywords=CCDC85A
CELF2https://www.omim.org/entry/602538?search=CELF2&highlight=celf2
CFDhttp://omim.org/entry/134350?search=CFD&highlight=cfd
CLUhttp://www.cancerindex.org/geneweb/CLU.htm
DPThttps://www.omim.org/entry/125597?search=DPT&highlight=dpt
EFEMP1http://www.cancer-genetics.org/EFEMP1.htm
ERGhttp://www.cancer-genetics.org/ERG.htm
EZH2https://www.omim.org/entry/601573?search=EZH2&highlight=ezh2
FAM107Ahttp://omim.org/entry/608295?search=FAM107A&highlight=fam107a
FERMT2https://www.omim.org/entry/607746?search=FERMT2&highlight=fermt2
FHL1http://omim.org/entry/300163?search=FHL1&highlight=fhl1
FXYD6http://omim.org/entry/606683?search=FXYD6&highlight=fxyd6
GLDNhttps://www.omim.org/entry/608603?search=GLDN&highlight=gldn
GPM6Ahttp://omim.org/entry/601275?search=GPM6A&highlight=gpm6a
GPM6Bhttp://omim.org/entry/300051?search=GPM6B&highlight=gpm6b
HSPB8http://omim.org/entry/608014?search=HSPB8&highlight=hspb8
ID4http://omim.org/entry/600581?search=ID4&highlight=id4
INMThttps://www.omim.org/entry/604854?search=INMT&highlight=inmt
IQGAP3http://www.genecards.org/cgi-bin/carddisp.pl?gene=IQGAP3&keywords=IQGAP3
ITIH5https://www.omim.org/entry/609783?search=ITIH5&highlight=itih5
KCNAB1https://www.omim.org/entry/601141?search=KCNAB1&highlight=kcnab1
KIF4Ahttp://omim.org/entry/300521?search=KIF4A&highlight=kif4a
MAMDC2https://www.omim.org/entry/612879?search=MAMDC2&highlight=mamdc2
MCAMhttp://www.cancer-genetics.org/MCAM.htm
MYH11http://www.cancer-genetics.org/MYH11.htm
MYL9http://www.cancer-genetics.org/PML.htm
MYLKhttps://www.omim.org/entry/600922?search=MYLK&highlight=mylk
NTRK2http://www.cancer-genetics.org/NTRK2.htm
NUSAP1http://omim.org/entry/612818?search=NUSAP1&highlight=nusap1
PCDH9http://omim.org/entry/603581?search=PCDH9&highlight=pcdh9
PGM5https://www.omim.org/entry/600981?search=PGM5&highlight=pgm5
PTRFhttp://omim.org/entry/603198?search=PTRF&highlight=ptrf
SDPRhttps://www.omim.org/entry/606728?search=SDPR&highlight=sdpr
STILhttps://www.omim.org/entry/181590?search=STIL&highlight=stil
SYNPO2http://www.genecards.org/cgi-bin/carddisp.pl?gene=SYNPO2&keywords=SYNPO2
TCEAL2http://www.genecards.org/cgi-bin/carddisp.pl?gene=TCEAL2&keywords=TCEAL2
TIMP3https://www.omim.org/entry/188826?search=TIMP3&highlight=timp3
Determination of functional annotation

The GENECODIS web software tool was used for functional annotation, which displays biological processes, molecular functions and cellular components that may be significantly enriched in a given gene list (17). The software also lists the KEGG pathways that may be significantly enriched in the gene list. The significance threshold of P<0.05 was selected. The results are illustrated in Figs. 24. The terms involving two or more genes were retained in the graphs. The significantly enriched biological processes were multicellular organismal development, cell adhesion, axon guidance, cell differentiation, blood coagulation, muscle contraction, cell death, negative regulation of apoptotic process and anti-apoptosis (Fig. 2). The significantly enriched molecular functions included protein, actin, calmodulin and syntaxin binding (Fig. 3). The significantly enriched cellular components were the nucleus, cytoplasm, plasma membrane, cytosol, caveola, stress fiber, focal adhesion, extracellular matrix, extracellular region and cystoskeleton (Fig. 4). Enriched KEGG pathways are listed in Table IV. The detailed GO enrichment was also obtained by use of the DAVID functional annotation tool (data not shown) (18). Several functional predictions were provided by DAVID, including the presence of BIRC5 in cell survival pathway, TIMP3 in p53 signaling pathway, CAV1 in integrin signaling pathway, and CFD in alternative complement pathway given by BIOCARTA. COG (Clusters of Orthologous Group) Ontology predicted KIF4A involved in cell division and chromosome partitioning, and MYL9 involved in signal transduction mechanisms/cytoskeleton/cell division and chromosome. Significantly enriched biological processes were sensory perception, angiogenesis, cell cycle checkpoint, nuclear division, cytokinesis, apoptosis, cell death, and cell adhesion. Cellular components included extracellular region, cytosol, cell surface, cytoskeleton, nucleolus, cell fraction. Enriched KEGG pathways included pathways in cancer, transcriptional misregulation in cancer, focal adhesion, vascular smooth muscle contraction, MAPK signaling pathway, and the neurotrophin signaling pathway. In summary, the results from the function annotation analysis demonstrate a significant association of the discovered DEGs with cancer pathogenesis.

Table IV.

Enriched KEGG pathways in differentially expressed genes as predicted by GENECODIS analysis.

Table IV.

Enriched KEGG pathways in differentially expressed genes as predicted by GENECODIS analysis.

KEGG pathwayClassNumber of genesP-value (adjusted)Gene symbols
Regulation of actin cytoskeletonCellular processes; cell motility30.016092MYLK, IQGAP3, MYL9
Vascular smooth muscle contractionOrganismal systems; circulatory system30.005475MYLK, MYH11, MYL9
Focal adhesionCellular processes40.003144CAV2, MYLK, CAV1, MYL9
Tight junctionCellular processes20.039699MYH11, MYL9
Bacterial invasion of epithelial cellsHuman diseases; infectious diseases20.016007CAV2, CAV1
Tryptophan metabolismMetabolism; amino acid metabolism20.01113INMT, AOX1
Viral myocarditisHuman diseases; Cardiovascular diseases20.015622CAV1, MYH11

[i] KEGG, Kyoto Encyclopedia of Genes and Genomes.

Listing the literature citations

To explore the cancer-specific citations for these genes, and in particular the distribution of number of relevant citations in individual and/or all cancer types addressed in the present study, the TARGETgene tool was used (21). The results demonstrated a high ranking in literature from NCBI for the candidate key genes. These rankings are reported in Table V. Notably, the maximum number of citations in all cancers for these genes ranged from 1–326, with no gene having zero number of citations, suggesting that the key genes are relevant to cancer. When number of citations in individual cancers was considered, several genes had no relevant citations. For example, NTRK2 has zero NCBI citation in prostate cancer, whereas several studies report a role for this gene in prostate cancer (39,40). Similarly, ID4 has been reported to have a role in lung cancer (41). A summary of the roles of these key genes in cancer is provided by cancer-genetics web database (19) and OMIM database (20) and listed in Table III.

Table V.

TARGETgene results for differentially expressed gene ranking and their number of citations in all and individual cancer types.

Table V.

TARGETgene results for differentially expressed gene ranking and their number of citations in all and individual cancer types.

RankGene symbolCitation numbers for all cancersCitation numbers for breast cancerCitation numbers for prostate cancerCitation numbers for lung cancer
  1MYLK  4  3  1  0
  2NTRK223  0  0  5
  3CAV1137462422
  4MCAM22  3  6  1
  5ANGPT135  3  0  3
  6CAV224  6  4  2
  7BIRC5326471846
  8EFEMP1  4  1  0  2
  9EZH2683733  6
10HSPB814  3  1  2
11ERG35  067  2
12MYH1116  1  1  0
13TIMP334  9  3  2
14MYL9  1  1  0  0
15SDPR  2  0  0  0
16PGM5  1  0  0  0
17CLU481118  8
18FHL1  5  1  1  0
19FXYD6  4  0  0  0
20KIF4A  8  0  1  0
21KCNAB1  2  0  0  0
22GPM6A  3  0  0  1
23CFD  1  0  0  0
24FAM107A  9  0  0  1
25PTRF  3  1  1  0
26DPT  3  0  0  0
27ID421  4  0  0
28FERMT2  4  1  0  2
29MAMDC2  4  0  0  0
30CCDC69  2  0  0  0
31IQGAP3  1  0  0  0
32PCDH9  3  1  0  0
33SYNPO2  7  0  3  0
34STIL21  0  0  1
35GLDN  2  0  0  0
36CCDC85A  1  0  0  0
37GPM6B  4  0  0  0
38ITIH5  5  4  1  1
39AOX1  3  0  0  0
40NUSAP1  2  0  0  0
41ACSS3  1  0  0  0
42TCEAL2  1  0  0  0
43INMT  3  0  0  1
Meta-analysis of the common set of DEGs

The I2 statistic describes the % of variation across studies that is due to heterogeneity with a confidence interval constructed using the iterative Chi-squared distribution method. The I2 statistic ensures that better consistency measure between the trials would be obtained in meta-analysis (35). The calculation of I2 is obtained from I2=100×(Q−df)/Q, where Q denotes the Cochran's heterogeneity statistic and df denotes degree of freedom. The I2 value lies between 0 and 100%, with all negative values set to zero. The grading of heterogeneity based on I2 value is categorized at 25, 50 and 75% as low, moderate and high heterogeneity respectively. For each DEG, heterogeneity analysis was performed using the meta package in R (37), by extracting RMA normalized values from the six microarray datasets. However, these values could not be retrieved for all the 44 genes, as some probes were not present in data derived from the GPL96 platform. Therefore, heterogeneity analysis was performed only for those DEGs for which the probe ID measurements were available in all six datasets. The results of this analysis are listed in Table VI. In this analysis, the P-value does not adequately describe the extent of heterogeneity in the results of the trials, whereas the I2 value does. Low I2 values indicate little variability between studies, with I2=0 meaning no heterogeneity. This non-heterogeneous behavior was observed in 6 genes out of the list of DEGs, namely CLU, EFEMP1, ID4, MCAM/MIR6756, PPAP2B, and DPT. The gene DPT was mapped by two different probe IDs, and therefore two different I2 values were obtained: one showed considerable heterogeneity, while the other showed no heterogeneity. The forest plots for some non-heterogeneous genes are illustrated in Figs. 5 and 6 as an example. These plots demonstrated that the mean difference of individual studies is very close to, or almost similar to the mean of all the studies, which is depicted by the dashed vertical line. Similar forest plots were observed for all heterogeneous genes (data not shown).

Table VI.

Meta-analysis of differentially expressed genes in the six microarray datasets.

Table VI.

Meta-analysis of differentially expressed genes in the six microarray datasets.

Gene symbolProbe IDI2 (%)QdfP-value
ANGPT1205608_s_at96.10129.215<0.0001
AOX1205083_at86.2036.335<0.0001
BIRC5202095_s_at97.00167.985<0.0001
CAV1212097_at91.8060.95<0.0001
CAV2203323_at90.1050.615<0.0001
CDKN1C213348_at92.6067.895<0.0001
CFD205382_s_at95.90120.825<0.0001
CLU208791_at0.002.9750.7051
DPT213068_at76.1020.9350.0008
DPT207977_s_at0.004.2550.5133
EFEMP1201843_s_at1.105.0550.4094
ERG213541_s_at96.20131.245<0.0001
EZH2203358_s_at97.00164.745<0.0001
FAM107A209074_s_at99.00507.185<0.0001
FERMT2209209_s_at89.10465<0.0001
FHL1210299_s_at86.8037.875<0.0001
FXYD6217897_at27.106.8650.2311
GPM6A209469_at97.90235.985<0.0001
GPM6B209168_at86.1035.995<0.0001
HSPB8221667_s_at65.5014.4750.0129
ID4209292_at0.003.4350.6338
KCNAB1210078_s_at64.5014.150.015
KIF4A218355_at95.80119.675<0.0001
LAPTM4B208767_s_at96.90163.185<0.0001
MCAM /// MIR6756210869_s_at0.004.6150.4657
MYH11201496_x_at91.5058.4950.001
MYL9201058_s_at73.8019.1250.0018
MYLK202555_s_at90.0049.865<0.0001
NTRK2221796_at88.6043.85<0.0001
NUSAP1218039_at97177.645<0.0001
PCDH9219737_s_at89.3046.865<0.0001
PPAP2B212226_s_at0.004.6550.4606
PTRF208789_at82.2028.165<0.0001
STIL205339_at95.20103.465<0.0001
TCEAL2211276_at76.3021.1150.0008
TIMP3201147_s_at86.8037.775<0.0001

Discussion

The present analysis was motivated by previous research studies (4244), where noteworthy genes were identified through bioinformatics analysis. The objective of the present study was to recognize common genetic indicators/biomarkers in lung, breast and prostate cancer, and to confirm their relevance in cancer by exploring NCBI citations using TARGETgene and by functional annotations using GENECODIS and DAVID. A robust gene set involved in the three cancer types was obtained, as microarray and RNA-Seq data were analyzed in combination in the present study. The RNA-Seq analysis proposed more genes compared with the microarray analysis to be involved in the process of oncogenesis. Further analysis would be required to classify these additional genes so that normal physiology could be attained by targeting cancer biomarkers. Further inspection of the obtained gene set for their inter-experiment behavior was performed to identify heterogeneity in expression. This is termed as meta-analysis as the normalized expression values from all available microarray data are combined. From this examination, it was evident that their comportment is subject to change in different types of cancers. A systematic review of between-study variance analysis demonstrated that some genes had no observed heterogeneity. These genes were CLU, EFEMP1, ID4, MCAM, PPAP2B and DPT, with I2 values 0, 1.1, 0, 0, 0 and 0% respectively. This indication of non-heterogeneous behavior across studies has inordinate importance from a biological perspective. Furthermore, some genes exhibited moderate heterogeneity, HSPB8, KCNAB1 and FXYD6 with I2 values 65.50, 64.50 and 27.10%, respectively. The DPT gene exhibited both types of behavior, which suggests that further experimental validation is required. The remaining genes had I2 values >70%, suggesting considerable heterogeneity. Thus, the present analysis demonstrated the mining of noteworthy gene markers by analysis of both microarray and RNA-Seq data and by identifying a common set of genes relevant in the three cancer conditions. By ensuring that the Affymetrix gene chip platforms used for all the microarray data were similar, technical variation between platforms were avoided. In addition, by applying a similar method for normalizing expression and detecting differential genes to all datasets, the present investigation led to the discovery of a common subset of genes which displayed significantly variable expression between tumor and normal samples from microarray data analysis. Further analysis of RNA-Seq data from the same cancer types to obtain overlapping results, resulted in a more robust gene list. The e roles of these genes in carcinogenesis were further confirmed by the results from GENECODIS (17), DAVID (18), cancer genetics web (19), OMIM (20) and literature citations (by using TARGETgene) (21). Finally, statistical analysis of heterogeneity led to novel conclusions about their performance in the three different cancer types. Further studies would be of interest, including how the deregulation of apoptotic pathways may be one of the major roles the genes discovered in the present study may have.

Acknowledgements

Authors would like to acknowledge CSIR-National Environmental Engineering Research Institute (Nagpur, India) for providing essential resources and constant support for the present research study. The authors would like to thank Dr Dhananjay Raje for his guidance and feedback.

References

1 

World Health Organization, . India-Cancer Country Profile. 2014.

2 

Stewart BW and Wild CP: World cancer report 2014. World Heal Organ. 1–2. 2014.

3 

American Cancer Society, . Cancer Facts & Figures. Atlanta, USA: 2015

4 

Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al: NCBI GEO: Archive for functional genomics data sets-Update. Nucleic Acids Res. 41(Database Issue): D991–D995. 2013.PubMed/NCBI

5 

Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E, Dylag M, Kurbatova N, Brandizi M, Burdett T, et al: ArrayExpress update-simplifying data submissions. Nucleic Acids Res. 43(Database Issue): D1113–D1116. 2015. View Article : Google Scholar : PubMed/NCBI

6 

Li J, Hou R, Niu X, Liu R, Wang Q, Wang C, Li X, Hao Z, Yin G and Zhang K: Comparison of microarray and RNA-Seq analysis of mRNA expression in dermal mesenchymal stem cells. Biotechnol Lett. 38:33–41. 2016. View Article : Google Scholar : PubMed/NCBI

7 

Fumagalli D, Blanchet-Cohen A, Brown D, Desmedt C, Gacquer D, Michiels S, Rothé F, Majjaj S, Salgado R, Larsimont D, et al: Transfer of clinically relevant gene expression signatures in breast cancer: From Affymetrix microarray to Illumina RNA-Sequencing technology. BMC Genomics. 15:10082014. View Article : Google Scholar : PubMed/NCBI

8 

Zhao S, Fung-Leung WP, Bittner A, Ngo K and Liu X: Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS One. 9:e786442014. View Article : Google Scholar : PubMed/NCBI

9 

Yang X, Bentink S and Spang R: Detecting common gene expression patterns in multiple cancer outcome entities. Biomed Microdevices. 7:247–251. 2005. View Article : Google Scholar : PubMed/NCBI

10 

Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A and Chinnaiyan AM: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA. 101:pp. 9309–9314. 2004; View Article : Google Scholar : PubMed/NCBI

11 

Xu L, Geman D and Winslow RL: Large-scale integration of cancer microarray data identifies a robust common cancer signature. BMC Bioinformatics. 8:2752007. View Article : Google Scholar : PubMed/NCBI

12 

Zhao P, Hu W, Wang H, Yu S, Li C, Bai J, Gui S and Zhang Y: Identification of differentially expressed genes in pituitary adenomas by integrating analysis of microarray data. Int J Endocrinol. 2015:1640872015. View Article : Google Scholar : PubMed/NCBI

13 

Yang Z, Chen Y, Fu Y, Yang Y, Zhang Y, Chen Y and Li D: Meta-analysis of differentially expressed genes in osteosarcoma based on gene expression data. BMC Med Genet. 15:802014. View Article : Google Scholar : PubMed/NCBI

14 

Chan SK, Griffith OL, Tai IT and Jones SJ: Meta-analysis of colorectal cancer gene expression profiling studies identifies consistently reported candidate biomarkers. Cancer Epidemiol Biomarkers Prev. 17:543–552. 2008. View Article : Google Scholar : PubMed/NCBI

15 

Dopazo J: Functional profiling methods in cancer. Methods Mol Biol. 576:363–374. 2010. View Article : Google Scholar : PubMed/NCBI

16 

Griffith OL, Melck A, Jones SJ and Wiseman SM: Meta-analysis and meta-review of thyroid cancer gene expression profiling studies identifies important diagnostic biomarkers. J Clin Oncol. 24:5043–5051. 2006. View Article : Google Scholar : PubMed/NCBI

17 

Tabas-Madrid D, Nogales-Cadenas R and Pascual-Montano A: GeneCodis3: A non-redundant and modular enrichment analysis tool for functional genomics. Nucleic Acids Res. 40(Web Server Issue): W478–W483. 2012. View Article : Google Scholar : PubMed/NCBI

18 

Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC and Lempicki RA: DAVID: Database for annotation, visualization, and integrated. Genome Biol. 4:P32003. View Article : Google Scholar : PubMed/NCBI

19 

S.J. C: Home Page|Cancer Genetics Web.

20 

McKusick VA: McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University. National Center for Biotechnology Information, National Library of Medicine B: Home-OMIM-NCBI. 2004.

21 

Wu CC, D'Argenio D, Asgharzadeh S and Triche T: TARGETgene: A tool for identification of potential therapeutic targets in cancer. PLoS One. 7:e433052012. View Article : Google Scholar : PubMed/NCBI

22 

Gruosso T, Mieulet V, Cardon M, Bourachot B, Kieffer Y, Devun F, Dubois T, Dutreix M, Vincent-Salomon A, Miller KM and Mechta-Grigoriou F: Chronic oxidative stress promotes H2AX protein degradation and enhances chemosensitivity in breast cancer patients. EMBO Mol Med. 8:527–549. 2016. View Article : Google Scholar : PubMed/NCBI

23 

Timmerman LA, Holton T, Yuneva M, Louie RJ, Padró M, Daemen A, Hu M, Chan DA, Ethier SP, van't Veer LJ, et al: Glutamine sensitivity analysis identifies the xct antiporter as a common triple-negative breast tumor therapeutic target. Cancer Cell. 24:450–465. 2013. View Article : Google Scholar : PubMed/NCBI

24 

Lu T, Tsai MH, Lee JM, Hsu CP, Chen PC, Lin CW, Shih JY, Yang PC, Hsiao CK, Lai LC and Chuang EY: Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women. Cancer Epidemiol Biomarkers Prev. 19:2590–2597. 2010. View Article : Google Scholar : PubMed/NCBI

25 

Landi MT, Dracheva T, Rotunno M, Figueroa JD, Liu H, Dasgupta A, Mann FE, Fukuoka J, Hames M, Bergen AW, et al: Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS One. 3:e16512008. View Article : Google Scholar : PubMed/NCBI

26 

Arredouani MS, Lu B, Bhasin M, Eljanne M, Yue W, Mosquera JM, Bubley GJ, Li V, Rubin MA, Libermann TA and Sanda MG: Identification of the transcription factor single-minded Homologue 2 as a potential biomarker and immunotherapy target in prostate cancer. Clin Cancer Res. 15:5794–5802. 2009. View Article : Google Scholar : PubMed/NCBI

27 

Planche A, Bacac M, Provero P, Fusco C, Delorenzi M, Stehle JC and Stamenkovic I: Identification of prognostic molecular features in the reactive stroma of human breast and prostate cancer. PLoS One. 6:e186402011. View Article : Google Scholar : PubMed/NCBI

28 

Rahman M, Jackson LK, Johnson WE, Li DY, Bild AH and Piccolo SR: Alternative preprocessing of RNA-Sequencing data in the cancer genome atlas leads to improved analysis results. Bioinformatics. 31:3666–3672. 2015. View Article : Google Scholar : PubMed/NCBI

29 

Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U and Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 4:249–264. 2003. View Article : Google Scholar : PubMed/NCBI

30 

Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B and Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31:e152003. View Article : Google Scholar : PubMed/NCBI

31 

Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 3:2004. View Article : Google Scholar : PubMed/NCBI

32 

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W and Smyth GK: Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43:e472015. View Article : Google Scholar : PubMed/NCBI

33 

Jeanmougin M, de Reynies A, Marisa L, Paccard C, Nuel G and Guedj M: Should we abandon the t-Test in the analysis of gene expression microarray data: A comparison of variance modeling strategies. PLoS One. 5:e123362010. View Article : Google Scholar : PubMed/NCBI

34 

Law CW, Chen Y, Shi W and Smyth GK: Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15:R292014. View Article : Google Scholar : PubMed/NCBI

35 

Higgins JP, Thompson SG, Deeks JJ and Altman DG: Measuring inconsistency in meta-analyses. BMJ. 327:557–560. 2003. View Article : Google Scholar : PubMed/NCBI

36 

Pathak M, Dwivedi SN, Deo SVS, Sreenivas V and Thakur B: Which is the preferred measure of heterogeneity in meta-analysis and why? A revisit. Biostat Biometrics. 2017.

37 

Schwarzer G: Package ‘meta. ’ R News:. 7:40–45. 2007.

38 

Ramasamy A, Mondry A, Holmes CC and Altman DG: Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med. 5:e1842008. View Article : Google Scholar : PubMed/NCBI

39 

Yamada Y, Toyota M, Hirokawa Y, Suzuki H, Takagi A, Matsuzaki T, Sugimura Y, Yatani R, Shiraishi T and Watanabe M: Identification of differentially methylated CpG islands in prostate cancer. Int J Cancer. 112:840–845. 2004. View Article : Google Scholar : PubMed/NCBI

40 

Faltermeier CM, Drake JM, Clark PM, Smith BA, Zong Y, Volpe C, Mathis C, Morrissey C, Castor B, Huang J and Witte ON: Functional screen identifies kinases driving prostate cancer visceral and bone metastasis. Proc Natl Acad Sci USA. 113:pp. E172–E181. 2016; View Article : Google Scholar : PubMed/NCBI

41 

Kamalian L, Gosney JR, Forootan SS, Foster CS, Bao ZZ, Beesley C and Ke Y: Increased expression of Id family proteins in small cell lung cancer and its prognostic significance. Clin Cancer Res. 14:2318–2325. 2008. View Article : Google Scholar : PubMed/NCBI

42 

Chen D and Yang H: Integrated analysis of differentially expressed genes in breast cancer pathogenesis. Oncol Lett. 9:2560–2566. 2015. View Article : Google Scholar : PubMed/NCBI

43 

Zhao Y, Fu D, Xu C, Yang J and Wang Z: Identification of genes associated with tongue cancer in patients with a history of tobacco and/or alcohol use. Oncol Lett. 13:629–638. 2017. View Article : Google Scholar : PubMed/NCBI

44 

Huang Y, Tao Y, Li X, Chang S, Jiang B, Li F and Wang Z: Bioinformatics analysis of key genes and latent pathway interactions based on the anaplastic thyroid carcinoma gene expression profile. Oncol Lett. 13:167–176. 2017. View Article : Google Scholar : PubMed/NCBI

Related Articles

Journal Cover

February 2018
Volume 15 Issue 2

Print ISSN: 1792-1074
Online ISSN:1792-1082

Sign up for eToc alerts

Recommend to Library

Copy and paste a formatted citation
APA
Makhijani, R.K., Raut, S.A., & Purohit, H.J. (2018). Identification of common key genes in breast, lung and prostate cancer and exploration of their heterogeneous expression. Oncology Letters, 15, 1680-1690. https://doi.org/10.3892/ol.2017.7508
MLA
Makhijani, R. K., Raut, S. A., Purohit, H. J."Identification of common key genes in breast, lung and prostate cancer and exploration of their heterogeneous expression". Oncology Letters 15.2 (2018): 1680-1690.
Chicago
Makhijani, R. K., Raut, S. A., Purohit, H. J."Identification of common key genes in breast, lung and prostate cancer and exploration of their heterogeneous expression". Oncology Letters 15, no. 2 (2018): 1680-1690. https://doi.org/10.3892/ol.2017.7508