Functional analysis of the mRNA profile of neutrophil gelatinase-associated lipocalin overexpression in esophageal squamous cell carcinoma using multiple bioinformatic tools

Neutrophil gelatinase-associated lipocalin (NGAL) is a member of the lipocalin superfamily; dysregulated expression of NGAL has been observed in several benign and malignant diseases. In the present study, differentially expressed genes, in comparison with those of control cells, in the mRNA expression profile of EC109 esophageal squamous cell carcinoma (ESCC) cells following NGAL overexpression were analyzed by multiple bioinformatic tools for a comprehensive understanding. A total of 29 gene ontology (GO) terms associated with immune function, chromatin structure and gene transcription were identified among the differentially expressed genes (DEGs) in NGAL overexpressing cells. In addition to the detected GO categories, the results from the functional annotation chart revealed that the differentially expressed genes were also associated with 101 functional annotation category terms. A total of 59 subpathways associated locally with the differentially expressed genes were identified by subpathway analysis, a markedly greater total that detected by traditional pathway enrichment analysis only. Promoter analysis indicated that the potential transcription factors Snail, deltaEF1, Mycn, Arnt, MNB1A, PBF, E74A, Ubx, SPI1 and GATA2 were unique to the downregulated DEG promoters, while bZIP910, ZNF42 and SOX9 were unique for the upregulated DEG promoters. In conclusion, the understanding of the role of NGAL overexpression in ESCC has been improved through the present bioinformatic analysis.


Introduction
Neutrophil gelatinase-associated lipocalin (NGAL), also termed lipocalin2, is a member of the lipocalin superfamily, which includes >20 members (1). NGAL is secreted extracellularly and forms a heterodimer with matrix metalloproteinase-9 (MMP-9) through disulfide bonds protecting against degradation (2). NGAL tightly binds to the bacterial siderophore, possibly serving as a potent bacteriostatic agent by sequestering iron as well as regulating innate immunity and inflammation (3). Overexpression of NGAL has also been observed in various types of human cancer, including breast, colorectal, pancreatic, ovarian, gastric, thyroid, ovarian, bladder and kidney cancer (4). Previous studies have shown that NGAL is upregulated in esophageal squamous cell carcinoma (ESCC) and is an independent prognostic factor; this upregulation was significantly correlated with cell differentiation and tumor invasion (5,6).
However, controversial results have been observed regarding the functional role of NGAL in various types of cancer cell. For example, NGAL was able to facilitate gastrointestinal mucosal regeneration by promoting cell motility and invasion and to reduce E-cadherin mediated cell-cell adhesion in colon cancer (7). NGAL was demonstrated to be highly expressed in human thyroid carcinomas, and NGAL knockdown inhibited cancer cell growth in soft agar and the formation of tumors in nude mice (8). Conversely, in pancreatic cancer cells, NGAL Functional analysis of the mRNA profile of neutrophil gelatinase-associated lipocalin overexpression in esophageal squamous cell carcinoma using multiple bioinformatic tools reduced adhesion/invasion partly through suppressing focal adhesion kinase activation and inhibited angiogenesis partly by blocking vascular endothelial growth factor production (9). In the present study, to examine the biological role of NGAL in ESCC, NGAL was overexpressed in the EC109 ESCC cell line. An mRNA microarray was performed using the Agilent whole genome oligo microarray to identify differentially expressed genes (DEGs) in NGAL overexpressing cells compared with control cells (10). Multiple bioinformatics analyses were performed on these DEGs in order to gain a comprehensive understanding of the role of NGAL overexpression in ESCC.

Materials and methods
Differentially expressed genes. The raw data were analyzed using normalization and log transformation (10). Differentially expressed genes were identified using a two-fold change threshold.

Gene ontology (GO) enrichment and functional annotation.
The Database for Annotation, Visualization and Integrated Discovery bioinformatics tool (DAVID; http://david.abcc. ncifcrf.gov/) was applied for GO enrichment, using category classes including Biological process, Cellular component and Molecular function. GO is one of the most useful methods for functional annotation and classification of genes. In addition, DAVID bioinformatics provides a functional annotation chart to identify over-represented biological terms from a particular gene list (11). Thus far, the functional annotation chart provides >40 category enrichments, including GO terms, sequence features, disease associations, protein functional domains, protein-protein interactions, pathways, homology, gene functional summaries and literature. The enriched terms from the functional annotation chart with P<0.05 were visualized by the Enrichment Map plugin for the Cytoscape network visualization software (12).
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and subpathway analysis. The bioconductor SubpathwayMiner package was applied to the DEG-enriched KEGG pathways identified (12). In addition to traditional entire pathway enrichment, SubpathwayMiner is able to detect subpathways, local regions of entire pathways, which aids in gaining more detailed information regarding the relevant genes in localized areas of a specific pathway (13). SubpathwayMiner extracts multiple subpathways from an entire KEGG pathway by the k-clique method. The distance between any two nodes (a node indicates a gene in the pathway) in a subpathway is not larger than k; k was set as 4 in the present study.
Promoter sequence patterns and potential transcription factor analysis. The 2,000-bp promoter sequences of the 20 genes exhibiting the greatest down-and upregulation, respectively, were retrieved from the UCSC genome database (http://genome. ucsc.edu/). The sequence patterns over-represented or under-represented in these two promoter sequence sets were analyzed by the POCO program (http://ekhidna.biocenter. helsinki.fi/poxo/poco/poco). POCO identifies motifs that are over-represented in one dataset compared with a background set, but under-represented in another dataset compared with the same background set. For the parameters in the present study, the background organism was set as homo_sapiens_clean and the longest pattern length was set as 8. Subsequently, significant sequence patterns were screened in the JASPAR transcription factor database (http://jaspar.binf.ku.dk) to identify recognized transcription factors (similarity index >0.70) (14).

GO enrichment and functional annotation.
A total of >200 DEGs in the NGAL overexpressing cells were obtained using a two-fold change as the threshold, including 167 upregulated genes and 96 downregulated genes (Table I).
To determine the functional classification of the various gene clusters, GO annotation was conducted using DAVID, which constructs statistically significant functional profiles from a set of genes. A total of 75, 8 and 7 significantly enriched GO terms were identified for these DEGs in the Biological process, Cellular component and Molecular function categories, respectively (P<0.05; Fig. 1). Notably, two predominant Biological process term groups were identified. One group comprised 21 immune-associated terms, including response to stress, defense response and regulation of immune response. The other group consisted of 8 terms regarding chromatin structure and gene transcription, including nucleosome assembly, chromatin assembly and protein-DNA complex assembly. These  The DEGs were also clustered using the Functional annotation chart in DAVID and the enrichment was visualized by the Enrichment Map plugin for the Cytoscape software. In Fig. 2, a node signifies one functional category and node size corresponds to the number of enriched genes. The color depth corresponds to the significance (P-value) of the terms. Nodes from the same functional category are presented as the same shape. Edges between nodes were depicted when overlapping genes existed between these two nodes. The widths of the lines indicate the number of overlapping genes between the functional groups, which are bigger and the wider with greater numbers. In the 180 total Functional annotation chart enrichments identified, in addition to 101 terms from the three GO categories, 78 terms from the following annotation categories were included: 14 from INTERPRO, 2 from SMART, 30 from SP_PIR_KEYWORDS, 24 from UP_SEQ_FEATURE, 1 from COG_ONTOLOGY, 2 from PIR_SUPERFAMILY, 1 from OMIM_DISEASE and 4 from KEGG_PATHWAY. These results provided a wider overview of the biological  Pathway and subpathway enrichment. The DEGs were mapped to KEGG pathways to identify the cell signaling pathways influenced by the downstream effectors of NGAL. The DEGs were enriched in only four pathways (Table II). The local area of an entire pathway was able to be defined by multiple subpathways using the node distance k, which aids in understanding how the indicated genes affect the pathway locally. The DEGs were found to be significantly enriched in 60 subpathways corresponding to 27 entire pathways using the Figure 1. Gene ontology enrichment of the differentially expressed genes from the mRNA expression profile of neutrophil gelatinase-associated lipocalin overexpressing EC109 esophageal squamous cell carcinoma cells, when compared with control cells. The terms from the Biological process, Cellular component and Molecular function categories are signified by different patterned bar graphs. Immune-associated terms and histone-associated terms are indicated by triangles and squares, respectively.  SubpathwayMiner package (Table III). Of note, the mitogen-activated protein kinase (MAPK) signaling pathway (has: 04010) was not detected by the entire pathway enrichment, but was found to be significant in the subpathway analysis, with three subpathways derived from three local areas of this signaling pathway (Fig. 3). The subpathway path:04010_2 contained three DEGs:  Promoter sequence patterns and potential transcription factors in upregulated and downregulated genes. The spatial distribution and abundance of promoter cis-elements affects gene expression. The co-expression of upregulated and downregulated genes in NGAL overpressing ECO109 cells was considered to be regulated by specific transcription factors at the transcriptional level. POCO is a software program that is able to identify over-represented and under-represented regulatory patterns among promoter sequence sets of upregulated and downregulated genes. In the present study, a total of 52 significant sequence patterns were identified to be over-represented in the downregulated genes but comparatively under-represented in the upregulated genes, of which the top 20 patterns are presented in Table IV. Conversely, 75 patterns were observed to be over-represented in the upregulated genes and simultaneously under-represented in the downregulated genes; the top 20 patterns are shown in Table V. The identified patterns were 5-8 bp long, containing the four known nucleotides, A, C, G and T, while the rest of the places in a pattern, marked as N, may be any of these (which are variable). Subsequently, all significant patterns were screened with the JASPAR transcription factor database to identify potential transcription factors. A total of 11 patterns corresponding to 14 unique transcription factors were detected (Fig. 4). Of these potential transcription factors, Snail, deltaEF1, Mycn, Arnt, MNB1A, PBF, E74A, Ubx, SPI1 and GATA2 were unique for the downregulated DEG promoters, while bZIP910, ZNF42 and SOX9 were unique for the upregulated DEG promoters. These results indicated that these transcription factors may be associated with specific transcriptional regulation in the downregulated and upregulated DEGs. Although a number of sequence patterns did not correspond to known transcription factors, the possibility and importance in the regulation of DEGs subsequent to NGAL overexpression was not discounted.

Discussion
ESCC has one of the highest mortality rates of malignant tumors worldwide, particularly in Asia, with an overall five-year survival rate <20% (16). NGAL has been shown to be an important mediator of invasion and metastasis in ESCC (5,6,10). However, for a improved understanding of the role of NGAL in ESCC, a comprehensive analysis of the mRNA profile of NGAL overexpression ESCC cells was conducted in the present study, using multiple bioinformatic analyses. A total of 267 DEGs were observed in the NGAL overexpressing cells compared with control cells, using a two-fold change as the threshold. To understand the function of these DEGs, the DEGs were analyzed by GO enrichment using DAVID bioinformatics. Several GO terms associated with known NGAL functions were detected. For example, 21 immune-associated terms were identified, including response to stress, defense response and regulation of immune response. In the response to stimulus (GO:0050896) term, >43 genes were enriched. For example, one of the  enriched genes, RAD9, protects against genomic instability by activating DNA damage checkpoint and DNA damage repair pathways (17). Another enriched gene, DEFB1, is constitutively expressed in epithelial tissues, but may be upregulated upon receiving inflammatory or microbial stimuli (18). Recent studies have observed that NGAL is involved in the antibacterial iron-depletion strategy of the innate immune system. NGAL binds catecholate-type siderophores, such as enterobactin synthesized by E. coli, to arrest E. coli growth through inhibiting the iron-uptake ability (19). Several studies found NGAL to be critical in the antimicrobial molecular response in infections, including Salmonella (20,21), Chlamydia (22) and Mycobacterium tuberculosis (23). The GO enrichment analysis in the present study suggested that in addition to NGAL itself, NGAL downstream effectors exert a marked impact on cell immune function and in response to other stimuli, including stress and defense responses.
Of note, 17 histone-associated proteins were upregulated in response to NGAL overexpression. The association between NGAL and histone-associated proteins had not been reported previously, to the best of our knowledge. Therefore, investigating how NGAL influences chromatin structure and gene transcription was of interest. The results of the present study provided novel information regarding the role of NGAL in gene transcriptional regulation through chromatin organization and nucleosome assembly.
The functional annotation chart provided a markedly wider overview of the biological impact of NGAL overexpression in ESCC than traditional GO enrichment. The chart reported that five DEGs were found using the hsa04350:TGF-beta signaling pathway term, which were not identified by the KEGG pathway enrichment analysis. Alport syndrome, which contained COL4A4 and COL4A3, was the only enriched term from the OMIM_DISEASE category listed in the chart. Urine and plasma NGAL have been revealed to be novel biomarkers for diagnosis and outcome prediction in renal dysfunction conditions, including acute kidney injury, chronic kidney disease and renal ischemia-reperfusion injury (24)(25)(26). The correlation between kidney disease and NGAL interaction with downstream effectors was marked. A total of 67 genes were enriched in the SP_PIR_KEYWORDS signal term and 33 of these genes were contained in the Secreted term.
NGAL is a secreted protein, which forms a complex with MMP-9 to prevent its autodegradation, which is critical for extracellular matrix remodeling (2). Extracellular NGAL has been suggested to cause the secretion of other proteins, such as FN1, which regulate the acute inflammatory response, cell-matrix adhesion and the defense response (27). Four genes, C3, SAA1, CFB and FN1, were enriched in the SP_ PIR_KEYWORDS acute phase term. Of note, all four genes are defined as positive acute phase proteins, which are considered to exert the following general functions: Opsonization OCC, total number of patterns within the corresponding cluster sequences; OCC1, downregulated DEG promoter sequence set; OCC2, upregulated DEG promoter sequence set; PRO, total number of sequences with the pattern in the corresponding cluster; TOT, total number of sequences in the corresponding cluster; F-score, analysis of variance between the two input clusters and the background sequence set. DEG, differentially expressed gene. and trapping of microorganisms and associated microbial products; binding cellular remnants, such as nuclear fractions; scavenging free hemoglobin and radicals; and modulating the immune response of the host (28).
Although an entire pathway may not be identified to be statically significant, alterations in local gene expression levels may affect the local pathway significantly, which results in a marked impact on the biological outcome. Subpathway analysis is a powerful method to detect genes in the local area of the KEGG pathway. Li et al (29) constructed a drug-metabolic subpathway network and found the local region of the tyrosine metabolic pathway to be closely associated with the development of lung cancer. A total of 60 subpathways corresponding to 27 entire pathways were found in the present study. Several subpathway-derived entire pathways were identified using this method. For example, the MAPK signaling pathway and the TGF-beta signaling pathway were detected. These results suggested that although certain DEGs did not significantly affect an entire pathway, they did perturb the pathway locally. Other proteins in these pathways were not differentially expressed at the mRNA level, but this may exclude processes such as modification and complex formation, undergone by the DEGs.
DEGs were classified into upregulated and downregulated genes as determined by the respective expression levels. How these two group genes were co-regulated by distin-guishing sequence patterns and transcription factors was notable. The POCO software program identifies over-and under-represented regulatory patterns among the promoter sequence sets of upregulated and downregulated genes. Not all DEGs are considered to be modified at the transcriptional level; the DEGS may have been differentially expressed due to differences in mRNA stability. Thus, in the present study, the 20 genes exhibiting the greatest up-or downregulation in NGAL overexpressing ESCC cells were analyzed by POCO. Hundreds of significant sequence patterns and dozens of transcription factors were found to be over-and under-represented in the downregulation gene set and the upregulation gene set, respectively. This suggested that the change in signal transduction following NGAL overexpression resulted in specific transcription factors and/or certain sequence patterns exerting critical regulatory roles, to achieve co-regulation of the significantly down-or upregulated genes at the transcriptional level.
A number of these potential transcription factors have previously been reported to be associated with cancer invasion or metastasis. Snail and ZEB1 (deltaEF1) are predominantly involved in the repression of E-cadherin expression, resulting in epithelial to mesenchymal transition, which has been implicated as the critical event initiating cancer invasion and metastasis (30,31). Overexpression of Snail was shown to correlate positively with lymphovascular invasion and was associated with poorer overall survival in ESCC patients (32). Nuclear OCC, total number of patterns within the sequences of the corresponding cluster; OCC1, downregulated DEG promoter sequence set; OCC2, upregulated DEG promoter sequence set; PRO, total number of sequences with the pattern in the corresponding cluster; TOT, total number of sequences in the corresponding cluster; F-score, result of the analysis of variance between the two input clusters and the background sequence set. DEG, differentially expressed gene.
expression of ZEB1 was observed in >33% ESCC tumor cells, while ZEB1 was not detected in the normal adult esophageal epithelia (33). PBF was hypothesized to induce the translocation of PTTG to the cell nucleus, where it induces tumorigenesis via a number of different mechanisms (33). PBF is upregulated by estrogen and mediates estrogen-stimulated cell invasion in breast cancer cells (34). SPI1 co-operates with MYC regulating the transcription of microRNA-29b, which is important in the neutrophil differentiation of acute promyelocytic leukaemia cells (35). Notably, NGAL was first identified as a protein stored in specific granules of human neutrophils (36). A potential SPI1 binding site was identified in the promoter region of the NGAL gene by computer analysis (37). These results indicated that SPI1 may be the key molecule in biological functions mediated by NGAL. SOX9, a high-mobility group box transcription factor, is required for development, differentiation and lineage commitment. Cytoplasmic SOX9 may serve as a valuable prognostic marker in invasive ductal carcinoma and metastatic breast cancer. The significant correlation identified between SOX9 and breast tumor cell proliferation implies that SOX9 directly contributes to the poor clinical outcomes associated with invasive breast cancer (38). These results indicated that these transcription factors may be involved in the invasion or metastasis mediated by NGAL. Although numerous sequence patterns were not matched to known transcription factors, the specific base composition suggested that these patterns may be crucial in transcriptional regulation. These results indicated that these sequence patterns and transcription factors may respond to particular transcriptional regulation in downregulated and upregulated DEGs.
In conclusion, in the present study, a comprehensive understanding of the role of NGAL in ESCC following NGAL overexpression was obtained by multiple bioinformatic analyses, particularly through analyzing subpathway and sequence patterns for co-expression, which provided more information than traditional methods. These analytical methods may be used to search for novel functional genes and pathways associated with the relevant genes identified from high-throughput data.