Identification of genes associated with laryngeal squamous cell carcinoma samples based on bioinformatic analysis

The present study aimed to investigate the differentially expressed genes (DEGs) between laryngeal squamous cell carcinoma (LSCC) samples and non-neoplastic laryngeal squamous cell samples, and the underlying biological mechanism. Gene expression profile data of GSE51985 and GSE10288 were obtained from the Gene Expression Omnibus database. The DEGs between the LSCC and normal samples were identified using the rowtest function in the genefilter package. Hierarchical clustering for DEGs was performed to confirm the distinction between the identified DEGs, and Gene Ontology term and pathway enrichment analyses were performed to determine the underlying function of the DEGs. In addition, protein-protein interaction networks were established to investigate the interactive mechanism of the DEGs. A total of 1,288 upregulated genes and 317 downregulated genes were identified between the LSCC samples and non-neoplastic LSC samples in the GSE51985 dataset, and five upregulated and 26 downregulated genes were identified in the samples from the GSE10288 dataset. The DEGs were clearly distinguished between the LSCC sample and the non-neoplastic LSCC sample by hierarchical clustering. The upregulated genes were predominantly involved in the cell cycle, cell division or focal adhesion, and the 295 upregulated genes formed 374 protein interaction pairs in interaction network analysis. The results revealed that the genes involved in the cell cycle, in cell division or in focal adhesion were associated with the development and progression of LSCC.


Introduction
Head and neck cancer is the sixth most common type of cancer, with an annual incidence of 700,000 patients worldwide (1,2).
It is reported that 20-30% of cases of head and neck cancer are laryngeal tumors (3). Laryngeal squamous cell carcinoma (LSCC), originating in the squamous cells, is the most common type of laryngeal carcinoma, accounting for ~25% of all cases of head and neck squamous cell carcinoma (4), with high mortality rates and a poor prognosis. The five-year survival rates are suggested to be between 52 and 94%, depending on the tumor site, stage and tumor therapy (5,6).
LSCC is considered to be result from the interactions of several genetic and environmental factors (7), including smoking, alcohol consumption (8,9), air pollution and viral infection. Efforts have been made to identify the genes involved in this type of cancer in past few decades. It was demonstrated, by expression profile screening, that protein tyrosine phosphatase receptor type δ is a suppressor gene in LSCC (10). Alterations in the expression of astrocyte elevated gene 1 exerts a predictive value in the prognosis of LSCC (11). Recurrent alterations in the levels of DNA methylation of Fanconi anemia-associated genes including FANCA, BRCA1 and BRCA2 contribute to the development of LSCC (12). However, the pathogenesis of LSCC and associated biological process and pathways remain to be elucidated.
Therefore, the aim of the present study was to determine the pathogenesis of LSCC and to investigate the differences between LSCC and non-neoplastic tissue samples at the molecular level. Differentially expressed genes (DEGs) between LSCC and normal samples were identified, followed by hierarchical clustering and function and pathway enrichment. Furthermore, functional interaction network analysis of the DEGs was performed. The results of the present study may provide novel insights into the therapeutics and assist in improving the survival rate and prognosis of patients with LSCC.

Materials and methods
Microarray data. The gene expression profiles of GSE51985 (13) and GSE10288 (14) were obtained from the National Center of Biotechnology Information Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.gov/geo/). Larynx tissues with regional lymph node metastasis and corresponding adjacent non-neoplastic tissues samples from 10 patients (all males; age range, 52-74 years) who underwent surgery for primary LSCC at the Department of Head and Neck Surgery (Beijing Tongren Hospital, Beijing, China) were available for GSE51985, while GSE10288 contained 13 lesion tissue samples of LSCC from LSCC patients (12 males and 1 female; age range, 44-73 years; two repeats were obtained from each patient) and 10 non-matched normal larynx tissue samples (each had two repeats) from non-neoplastic larynx from the Arnaldo Vieira de Carvalho Hospital (São Paulo, Brazil).
Data preprocessing. The samples from GSE51985 were annotated using the Illumina HumanHT-12 V4.0 expression beadchip platform (Illumina Inc., San Diego, CA, USA). The expression values were subjected to quantile data normalization using Illumina's Genome Studio v1 software (Illumina Inc.), followed by log 2 transformation and gene annotation. The microarray detection platform, CAGE Lab-Head and Neck carcinoma cDNA microarray (Department of Biochemistry, Institute of Chemistry, University of São Paulo, Brazil), was used for the annotation of the samples from GSE10288. The probe data were initially normalized by a locally weighted scatterplot smoothing algorithm (http://connection. ebscohost.com/c/articles/28834113/optimized-lowess-normalization-parameter-selection-dna-microarray-data) (15) using GeneSpring software, version 10.0 (Agilent Technologies, Inc., Foster City, CA, USA) and then annotated for gene expression value as described above. In cases where one gene corresponds to multiple probe sets, the average was used as the gene expression value. Subsequently, the normalized values were used to calculate log 2 -transformed Cy5/Cy3 ratios (denoted log 2 -ratios) for each gene using GeneSpring software, version 10.0. DEG analysis. To investigate the differences between the LSCC samples and the non-neoplastic LSCC samples, the rowtest algorithm of the genefilter package in R/Bioconductor (www. bioconductor.org/packages/2.3/bioc/html/genefilter.html) (16) was used to identify the DEGs in the two sample groups. The DEGs were required to meet the criteria that |log 2 fold change (FC)|>1 and P<0.05. Subsequently, the DEGs obtained from the two microarray data were compared using hierarchical clustering in R, version 3.0.2 234 (www.r-project.org). Heatmaps, based on the gene expression values, were produced to verify the distinguished effect of the identified genes on the LSCC and non-neoplastic samples.
Function and pathway enrichment analysis of DEGs. Gene Ontology (GO) analysis is widely used for functional investigations of large-scale genomic or transcriptomic data (17), which characterizes genes or gene products to a biological process, molecular function and cellular component. Kyoto Encyclopedia of Genes and Genomes (KEGG; www. genome.ad.jp/kegg/kegg2.html pathway analysis is another technique to reveal the biological mechanisms of large numbers of genes derived from high-throughput genomic experiments (18). Database for annotation, visualization and integrated discovery (DAVID; david.abcc.ncifcrf.gov) is one of the most commonly used tools for GO enrichment and pathway analysis (19). As few genes have been sequenced in the GSE10288 profile, only the DEGs of the GSE51985 dataset were subjected to DAVID in the present study, to identify the differences in functions and pathways, with an enrichment significance false discovery rate (FDR) of <0.05. The enrichments of the upregulated and downregulated DGEs were performed separately.

Functional interaction network analysis of the upregulated DEGs.
To gain further insights into the functional coordination of the DEGs, the human protein reference database (HPRD; www.hprd.org) was used to examine the interacting pairs associated with the upregulated DEGs. The interacting pairs were visualized via Cytoscape (www.cytoscape. org) (20). Additionally, the significant pathways associated with these pairs were enriched via DAVID, with the threshold as FDR<0.05.

Results
DEG screening and comparison. Following normalization of the microarray data (Fig. 1), a rowtest algorithm was used to identify the DEGs between the LSCC samples and the non-neoplastic LSC samples. A total of 1,605 genes were identified as significantly differentially expressed in the samples from the GSE51985 dataset, among which 1,288 genes were upregulated and 317 genes were downregulated in the LSCC samples, compared with the non-neoplastic LSC samples. Similarly, 31 genes were identified as DEGs in the samples from the GSE10288 dataset, including five upregulated and 26 downregulated genes (Table I). Following comparisons of the DEGs in the GSE51985 and GSE10288 datasets, four genes were found to be differentially expressed in the two datasets. These genes were dynein, axonemal, heavy Chain 1 (DNAH1), ubiquitin C (UBC), early endosome antigen 1 (EEA1) and ubiquitin specific peptidase (EEA1), of which, DNAAH1 was downregulated in the LSCC sample from the two datasets, and the other three DEGs exhibited a different trend of expression in the two datasets, which may have been attributed to the different sources of the samples for the two datasets.

A B
LSCC was clustered into the non-neoplastic LSCC sample cluster, and no significant difference was observed in the DEGs between the cancer 4 sample and the corresponding control 4 sample. Therefore, the cancer 4 sample was excluded from the subsequent hierarchical clustering. As expected, the selected DEGs were well distinguished between the LSCC sample and the non-neoplastic LSCC sample (Fig. 2B).
GO enrichment analysis and pathway analysis. The results of the GO enrichment analysis are shown in Table II, which demonstrated that the upregulated genes were predominantly involved in the cell cycle and cell division processes (34 GO terms). The downregulated genes included only a few genes, which were involved in the oxidation reduction process. The pathway enrichment analysis revealed that no Functional network analysis of the upregulated DEGs. HPRD consists of 39,240 protein pairs and 10,200 proteins. The LSCC upregulated genes were mapped to HPRD, and 374 interacting pairs, including 294 upregulated DEGs, were identified and used for construction of the interaction network (Fig. 3). Among the 294 genes, 64 were enriched in several specific pathways (Table III), including the cell cycle (hsa04110), pathways in cancer (hsa05200), small cell lung cancer (has05222) and focal adhesion (hsa04510). Overall, there were 21 genes enriched in the focal adhesion pathway, including epidermal growth receptor (EGFR), caveolin 2 (CAV2), collagen type V, alpha 1 (COL5A1) and laminin alpha 1 (LAMA1).

Discussion
Gene expression levels in disease reveal the potential biological mechanism of the disease. The present study downloaded two datasets of gene expression profiles from GEO. A total of 1,605 genes were identified as significantly differentially expressed in samples from the GSE51985 dataset and 31 genes were identified as DEGs in samples from the GSE10288 dataset. Although identified in different samples, certain genes were revealed to be differentially expressed in the two profiles, including DNAH1. DNAH1, which codes the proteins of the axonemal dynein cluster, is a large subunit of dynactin. The DNAH1 mutation has been detected in exome-sequenced colorectal cancer and melanoma specimens (21). DNAH1 is involved in the significant differences in DNA copy number between adenocarcinoma and squamous cell carcinoma (22). Furthermore, DNAH1 is putatively involved in cell motility and migration (23). Cancer cells move within tissues during invasion and metastasis through their own motility (24), and the migratory mechanisms can respond to different conditions (25). Multiple genes associated with cell motility are reported to be deregulated in human cancer (26). DNAH1 was also downregulated in the LSCC samples used in the present study, therefore, it was suggested that DNAH1 may exert its effect in LSCC through its involvement in cell motility.
By analyzing two datasets of LSCC samples, the present study revealed that the DEGs and their function in the LSCC sample demonstrated similar characteristics with general types of cancer, particularly the upregulated genes, as they were significantly involved in cell cycle, likely to increase cell proliferation rate and lead to tumorigenesis. CDC7 and CDK1 were among the genes enriched in the cell cycle pathway. CDKs are threonine/serine protein kinases, the activities of which depend on the action and binding of cyclin partners (27). Tumor-associated cell cycle defects are usually mediated by alteration in the activity of CDK (28). As a key regulator of the cell cycle, CDK1 is a powerful therapeutic target for cancer inhibitors (29). In precursor lesions and esophageal adenocarcinoma, the expression of CDC72/CDK1 serves as a diagnostic and cancer progression marker (30).
In addition, the present study demonstrated that certain genes were also involved in the LSCC bifocal adhesion pathway, including EGFR. Focal adhesion kinase (FAK) is involved in cancer cell tumor formation and progression (31). Lymph node metastasis in esophageal SCC is associated with the overexpression of FAK (32), which is also observed in head and neck squamous cell carcinoma (33). The simultaneous  -08  CDC7, CDK1, E2F3, CDK6, CHEK1, CDC20, MCM2,  CHEK2, CDK4, CDC25C, MCM3, MCM4, CDK2,  TGFB2, CDC25B, CCNB1, CCNE1, CDKN1A inhibition of EGFR and the FAK pathway increases the apoptotic response of cancer cells (34). In addition, in colon and breast cancer cells, FAK survival signaling exerts its roles by combining with EGFR (35). EGFR-targeted therapy is used in the treatment of head and neck cancer via targeting the pathways involved in tumor growth, angiogenesis, metastasis and invasion (36), for example, the EGFR inhibitor, gefitinib, has been used in clinical practice in the treatment of head and neck squamous cell carcinoma (37). High levels of EGFR can indicate patients with laryngeal cancer with a poor prognosis (38). Therefore, the present study hypothesized that EGFR is involved in the development of LSCC via the FAK pathway. Furthermore, genes of the integrin family are also involved in LSCC via the FAK pathway, including integrin α1 (ITGA1) and integrin β3 (ITGB3). It is reported that the combination of ITGB3 with SDC4 may result in the activation of FAK (39). Integrin/FAK signaling can control tumor initiation, growth and progression into malignant squamous cell carcinoma (40).
In conclusion, with the assistance of high-throughput microarray data analysis, based on bioinformatics methods, the present study identified several DEGs, as well as their abnormal functions and pathways, in LSCC. The associations identified between the DEGs and their relative biological processes offer novel insights into the mechanism underlying LSCC.