Identification of differential splicing genes in gliomas using exon expression profiling

Diffuse gliomas are the most common type of malignant primary brain tumor, and their initiation and/or progression are often associated with alternative splicing. They produce an enormous economic burden on society and greatly impair the quality of life of those affected. The aim of the current study was to explore the differentially expressed genes (DEGs) observed in glioblastoma (GBM) and oligodendroglioma (OD) at the splicing level, and to analyze their functions in order to identify the underlying molecular mechanisms of gliomas. The exon-level expression profile data GSE9385 was downloaded from the Gene Expression Omnibus database, and included 26 GBM samples, 22 OD samples and 6 control brain samples. The differentially expressed exon-level probes were analyzed using the microarray detection of alternative splicing algorithm combined with the splicing index method, and the corresponding DEGs were identified. Next, a Gene Ontology enrichment analysis of the DEGs was performed. Additionally, the protein-protein interaction (PPI) networks were constructed based on the depth-first search algorithm. A total of 300 DEGs were identified to be shared by GBM and OD, including 97 upregulated and 203 downregulated DEGs. Furthermore, screening with a defined threshold identified 6 genes that were highly expressed in GBM, including AFF2, CACNA2D3 and ARPP21, while the 6 highly expressed genes in OD notably included CNTN2. The TP53 and HIST1H3A genes were the hub nodes in the PPI network of DEGs from GBM, while CNTN2 was linked to the highest degree in the OD PPI network. The present study provides a comprehensive bioinformatics analysis of DEGs in GBM and OD, which may provide a basis for understanding the initiation and/or progression of glioma development.


Introduction
Diffuse gliomas are the most common type of intracranial malignant neoplasm, and account for >60% of all primary brain tumors (1). Based on the classification of nervous system tumors by the World Health Organization, diffuse gliomas are classified into seven principal categories: Diffuse astrocytoma (grade II), oligodendroglioma (OD; grade II), oligoastrocytoma (grade II), anaplastic astrocytoma (grade III), anaplastic oligodendroglioma (grade III), anaplastic oligoastrocytoma (grade III) and glioblastoma (GBM, grade IV) (1). Of these, GBM is the most common and aggressive type of primary brain tumor, accounting for 80% of malignant astrocytomas (2). GBM may develop rapidly without the diagnosis of a less malignant precursor lesion, and this is termed primary or de novo GBM. It may also develop slowly through progression from a pre-existing low-grade glioma, in which case it is termed a secondary GBM (3). Although GBMs are considered as primarily astrocytic gliomas (4), a subset of GBMs exhibit OD-like tumor cell differentiation (5). OD is a well-differentiated, slowly grown and diffusely infiltrated tumor observed in adults, and is typically located in the cerebral hemispheres (6). Despite advances in neurosurgery, chemotherapy and radiotherapy, glioma commonly has a poor prognosis (7). Therefore, it is critical that the genetic pathways underlying the development of this type of cancer are defined.
A previous study indicated that tumor-specific alternative splicing is important in the regulation of gene expression and corresponding protein functions during cancer development (8). Multiple alternative splicing transcripts have been identified as progression markers, including generalized splicing abnormalities and tumor-and stage-specific events (9)(10). A number of studies have documented that the initiation and/or progression of glial brain tumors is influenced by aberrant splice isoforms, including epidermal growth factor receptor, phosphatase and tensin homolog, tumor protein p53 (TP53), proliferation-related Ki-67 antigen, murine double-minute 2, mutS homolog 2, platelet-derived growth factor α and Kruppel-like transcription factor (11-15). However, the molecular mechanisms associated with the alternative splicing that may lead to the development and progression of GBM and OD remain to be clearly demonstrated.
In the current study, the gene expression profiles of GBM, OD and patient-matched normal brain tissues were downloaded from the Gene Expression Omnibus (GEO) database. The significant differentially expressed exon-level probes and their corresponding genes were identified using a combination of the splicing index (SI) method and the microarray detection of alternative splicing (MIDAS) algorithm. In addition, the screened differentially expressed genes (DEGs) were further analyzed with bioinformatics methods. The current study aims to improve the understanding of molecular mechanisms of GBM and OD and may clarify the processes involved in the development of gliomas.

Materials and methods
Aff ymetrix microarray analysis. The gene expression profile data GSE9385 (7) was obtained from the National Center for Biotechnology Information and GEO database (http://www.ncbi.nlm.nih.gov/geo/), which is based on the GeneChip Human Exon 1.0 ST Array (GPL5188) platform (Affymetrix, Santa Clara, CA, USA). A total of 54 specimens were available, including 26 GBM, 22 OD and 6 control brain samples.
Data preprocessing. The background correction and data normalization were performed by the robust multiarray average (RMA) algorithm based on the Affymetrix Power Tools (http://www.affymetrix.com/) program (16). Additionally, the probe sets were filtered according to the methods described by Gardina et al (17). To reduce the false positive rate, only the genes with a PLIER signal of >200 and corresponding detection above background (DABG) with a P-value of ≤0.05 were accepted. Probe sets with cross-hybridization type were also removed (18).
DEG analysis. The differences in exon-level expression can result from one of two factors, namely, differential splicing or differential gene expression (18). To detect differential splicing, the gene expression level was normalized. Following normalization, the exon-level expression value (I) was calculated for each exon to reflect the actual exon-level expression. For exon i in gene j, I i,j is denoted as follows: G is the average gene expression value of all specimens and G j is the gene expression value of specimen j. E i,j represents the expression values of exon i in gene j. Analysis using this formula is known as the SI method (18,19). The exon-and gene-level expression values were computed using the RMA algorithm.
Based on the SI method, the differentially expressed exon-level probes between GBM/OD, GBM/normal and OD/normal samples were identified with Student's t-test. Only exon-level probes with P-values <0.01 and differential regulation of RMA signals >2-fold were selected. Since the RMA-generated signals were reported as the value of log2 transformation, the geometric mean >1 represented an RMA signal that was upregulated >2-fold. Also, in order to improve the accuracy of the results, the MIDAS algorithm (http://www.affymetrix.com/) (17) was used to identify differentially expressed exon-level probes based on the analysis of variance. P<0.05 was selected as the cut-off criterion. Only the exons identified by the two methods were used. A hierarchi cal clustering of the screened DEGs was performed based on their expression values using the Hclust package of R software (20).
Gene ontology (GO) enrichment analysis. GO analysis is a commonly used approach for functional studies of large-scale genomics (21). The Database for Annotation, Visualization and Integrated Discovery (DAVID), a high-throughput and integrated data-mining environment, analyzes gene lists derived from high-throughput genomic experiments (22). The current study used the DAVID to identify which represented GO categories were significantly enriched in DEGs.

Protein-protein interaction (PPI) network construction.
The Search Tool for the Retrieval of Interacting Genes (STRING) (23) database was used to annotate functional interactions between differentially expressed proteins that regulate alternative splicing and other proteins by calculating their confidence score. Interactions with a score >0.8 were selected. In order to remove the irrelevant nodes, reduce the network size and restrict the search space into a realistic and meaningful one, the depth first search algorithm was employed to construct the PPI network with the path length <3.

Identification of DEGs.
Exon-level expression data were compared between GBM/OD samples, GBM/normal samples and OD/normal samples. Using the SI method, 1,343 differentially expressed exon-level probes with P-values <0.01 and differential regulation of RMA signals >2-fold were identified. Based on the MIDAS method, 5,290 differentially expressed exon-level probes with the P-value <0.05 were selected. A total of 982 DEGs were identified by both methods simultaneously.
The expression values of these 982 DEGs were hierarchically clustered by the Hclust package of R software (Fig. 1). Compared with normal brain tissue, 617 and 498 DEGs were identified in GBM and OD, respectively ( Fig. 2A). A total of 97 upregulated and 203 downregulated DEGs were identified to be present in both GBM and OD (Fig. 2B), and 236 DEGs were obtained between GBM and OD, 94 of which were upregulated and 142 downregulated in GBM ( Fig. 2A).
With the strict threshold of RMA signals that were upregulated at least 4-fold and 2-fold compared with normal brain tissue and OD respectively, a total of 6 highly expressed genes were identified in GBM, including AFF2, GNAL, ARPP21, CACNA2D3, HIST1H3A~HIST1H3J, and RGS7 (Table I).
Similarly, at the cut-off criteria of RMA signals upregulated at least 4-fold and 2-fold compared with normal brain tissue and GBM respectively, a total of 6 highly expressed genes were identified in OD, including CNTN2, ABCA6, MEGF11, DOCK5, MOXD1, and TRIM67. There were 7 genes downregulated at least 8-fold compared with controls in both GBM and OD. These included APBA2, MAP4, NUF2, INPP5F and TOP2A. These candidate genes were suggested as markers of alternative splicing in GBM and OD.

A B
GO enrichment analysis. To investigate the functional changes in the initiation and/or progression of glial brain tumors, the 300 overlapping DEGs in GBM and OD were mapped to the GO database. A P-value ≤0.01 and fold change >2 were used for the threshold. The significant GO terms of the 97 upregulated DEGs shared by GBM and OD included various processes, including cell adhesion, extracellular structure organization, collagen biosynthetic process, neuron development and the regulation of small GTPase-mediated signal transduction (Table II). For the 203 downregulated DEGs shared by GBM and OD, the significant GO terms included processes such as the regulation of small GTPase-mediated signal transduction, transmission of nerve impulses and positive regulation of apoptosis (Table II). The DEGs that were upregulated ≥2-fold in one subgroup compared with the other were also mapped to the GO database. A total of 94 upregulated DEGs in GBM were significantly enriched into 10 GO terms, including neuron differentiation, exocytosis and regulation of neurotransmitter secretion (Table III). Additionally, 42 upregulated DEGs in OD were significantly enriched into 7 GO terms, which included the transmission of nerve impulses, cell-cell signaling and synaptic transmission (Table IV).
PPI network construction. In order to construct the PPI network, the depth-first search algorithm was employed to obtain the PPI data from the STRING database. PPI networks of highly expressed genes in GBM (Fig. 3A) and OD (Fig. 3B) were constructed with the path length <3. In the PPI network of GBM, the genes HIST1H3A and TP53 contained the highest degrees. In addition, in the PPI network of OD, GNTN2 acted as hub nodes.

Discussion
Formation and malignant progression of diffuse gliomas are associated with alterations in a variety of genes that regulate the normal homeostasis of cell proliferation, differentiation and apoptosis (24). The connection between abnormal regulation of alternative splicing and tumor development has emerged as a novel aspect of cancer biology. The identification of alternatively spliced genes in GBM and OD may provide novel molecular markers for the diagnosis and treatment of the two subtypes of glioma.
In the present study, 300 overlapping DEGs were identified in GBM and OD, compared with normal control tissue. These included 97 upregulated and 203 downregulated DEGs. Notably, 117 of these 300 DEGs were associated with alternative splicing. With the strict threshold, 6 highly expressed genes were screened in GBM, including AFF2, GNAL, ARPP21, CACNA2D3 and RGS7, in addition to 6 highly expressed genes in OD, including CNTN2, ABCA6, MEGF11, DOCK5, MOXD1 and TRIM67. Finally, by constructing a PPI network of DEGs, it was demonstrated that TP53 and HIST1H3A were the hub nodes in the PPI network of GBM and CNTN2 was the hub node in OD. Table I. Identification of differentially expressed genes between GBM/OD samples, GBM/normal samples and OD/normal samples.

A B
AFF2/FMR2 is extended over >600 kb in Xq27.3-q28, composed of 22 exons with a complex pattern of alternative splicing (25). It has been demonstrated that AFF2 mutations are associated with breast tumors (26). Recently, an excess of non-synonymous missense variants in FMR2 has been reported in males with autism spectrum disorders (27), indicating the role of FMR2 in normal brain function. The silencing of the AFF2 gene can lead to Fragile XE syndrome (28). In agreement with a previous study, the current study indicated that the AFF2 gene with differentially spliced exons was highly expressed in GBM. Bensaid et al (29) reported that as an RNA-binding protein, the AFF2 protein serves an essential role in alternative splicing regulation via the interaction with the G-quartet RNA-forming structure. CACNA2D3 protein is an auxiliary member of the α-2/δ subunit family of the voltage-dependent calcium channel complex (30). The CACNA2D3 gene has been suggested as a putative tumor suppressor gene in lung cancer, renal cell cancer neuroblastoma and squamous cell esophageal cancer (31), and has been identified as an indicator of prognosis in gastric cancer (32). CACNA2D3 is highly expressed in neuroblasts and neuroblastomas with a favorable prognosis, while its expression is downregulated in those with a poor prognosis (33). Additionally, ARPP21 encodes a 21-kDa cAMP-regulated phosphoprotein termed regulator of calmodulin signaling, which is enriched in the brain and may serve as a candidate tumor suppressor gene (34). ARPP21 is frequently downregulated, as is miR-128-2, in human breast cancer (35,36). Notably, miR-128 expression may significantly reduce glioma cell proliferation in vitro and glioma xenograft growth in vivo (37). In addition, Donzelli et al (36) reported that mutant TP53 is able to bind the putative promoter of the miR128-2 host gene (ARPP2), which determines the concomitant induction of ARPP21 mRNA expression in lung cancer cells, and thus inhibits apoptosis (36). In the present study, TP53 acted as the hub node in the GBM network, indicating its important role in the initiation and/or progression of GBM. A previous study suggested that TP53 regulates the proliferation, differentiation and survival of stem cells, which further highlights the importance of TP53 in GBM suppression (38). Additionally, TP53 mutations have been noted in 5-15% of cases of OD (39). TP53 mutations are the most frequent type of gene-specific alteration identified in human cancers (40). Shiraishi et al (41) studied the different locations of TP53 mutations between anaplastic astrocytoma and GBM, and suggested that the TP53 mutation may contribute to tumorigenesis and also to the progression of malignancy in gliomas.  It has been suggested that CNTN2 (also knows as contactin-2 or axonal glycoprotein TAG-1) is involved in the cell adhesion process and serves a critical function in the early stages of hepatocellular carcinoma (42). Adair et al (43) have also demonstrated that the CNTN2 gene is expressed in a variety of tumor cell lines, including those from the brain, breast and lung, and particularly in an unusually high percentage of melanoma cells. In the present study, CNTN2, with a high expression level, was the hub node in the PPI network of OD, suggesting that it may serve an essential function in the pathogenesis of OD.
GO enrichment analysis indicated a close relationship between the two subsets of gliomas and cell adhesion, which is in agreement with previous studies (44,45). An essential step in tumor progression is tumor invasion, which is dependent on the preservation of a delicate balance between cell adhesion and cell detachment. Chen et al (42) have suggested that the cell adhesion pathway is important for cancer cell invasion and metastasis. Abnormality of cell adhesion pathways is considered as a characteristic of advanced cancer. Another study demonstrated that alterations in several classes of adhesion molecules were implicated in the progression of various forms of cancer, including GBM (46).
In conclusion, the current data provide a comprehensive bioinformatics analysis of the alternatively spliced genes that may be involved in the pathogenesis of GBM and OD. A total of 300 overlapping DEGs were identified between GBM and OD. In addition, compared with normal brain tissue and OD, 6 highly expressed genes were identified in GBM, while 6 were identified in OD compared with normal brain tissue and GBM. The present analysis provides a basis for the understanding of the molecular mechanism of GBM and OD. However, further experimental studies with larger sample sizes are required to confirm these observations.