Introduction

Oncology Letters

1792-1074 1792-1082

D.A. Spandidos

10.3892/ol.2023.13824

OL-25-6-13824

Articles

AURKA, TOP2A and MELK are the key genes identified by WGCNA for the pathogenesis of lung adenocarcinoma

Yunqing

1 * Wang

Sen

2 3 * Xu

Bin

1 Lin

Huiqing

4 Zhan

5 Ren

Jiacai

5 Song

Wenling

1 Han

Rong

1 Cheng

Liping

1 Zhang

Man

1 Zhang

Xiuyun

1Department of Oncology, People's Hospital of Huangpi District, Wuhan, Hubei 430000, P.R. China 2Department of Forensic Medicine, Guangxi Medical University, Nanning, Guanxi 530021, P.R. China 3School of Basic Medicine Sciences, Guangxi Medical University, Nanning, Guanxi 530021, P.R. China 4Department of Thoracic Surgery, Renmin Hospital of Wuhan University, Wuhan, Hubei 430060, P.R. China 5Department of Pathology, Renmin Hospital of Wuhan University, Wuhan, Hubei 430060, P.R. China

Correspondence to: Dr Xiuyun Zhang, Department of Pathology, Renmin Hospital of Wuhan University, 238 Jiefang Road, 99 Zhangzhidong Road, Wuchang, Wuhan, Hubei 430060, P.R. China, E-mail: zhangxiuyun0826@163.com *

Contributed equally

06 2023

19 04 2023

25 6

238

22092022 23022023

2023

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

The comprehensive analysis of single or multiple microarray datasets is currently available in Gene Expression Omnibus (GEO) databases, with several studies having identified genes strongly associated with the development of lung adenocarcinoma (LUAD). However, the mechanisms of LUAD development remain largely unknown and has not yet been systematically studied; thus, further studies are required in this field. In the present study, weighted gene co-expression network analysis (WGCNA) was used for the evaluation of key genes with potential high risk of LUAD, and to provide more reliable evidence concerning its pathogenesis. The GSE140797 dataset from the high-throughput GEO database was downloaded and was first analyzed using the Limma package in the R language in order to determine the differentially expressed genes. The dataset was then analyzed using the WGCNA package to analyze the co-expressed genes, and the modular genes with the highest correlation with the clinical phenotype were identified. Subsequently, the pathogenic genes shared in common between the result of the two analyses were imported into the STRING database for protein-protein interaction network analysis. The hub genes were screened out using Cytoscape, and then The Cancer Genome Atlas analysis, receiver operating characteristic analysis and survival analysis were subsequently performed. Finally, the key genes were evaluated using reverse transcription-quantitative PCR and western blot analysis. Bioinformatics analysis of the GSE140797 dataset revealed eight key genes: AURKA, BUB1, CCNB1, CDK1, MELK, NUSAP1, TOP2A and PBK. Finally, the AURKA, TOP2A and MELK genes were evaluated in samples from patients with lung cancer using WGCNA and RT-qPCR, western blot analysis experiments, providing basis for further research on the mechanisms of LUAD development and targeted therapy.

lung adenocarcinoma weighted gene co-expression network analysis key genes protein-protein interaction network

National Natural Science Foundation Youth Project

82170106

The present study was supported by the seventh batch of young and middle-aged medical backbone talent training projects in Wuhan in 2019 [Wu Weitong (2019); Grant no. 87]; National Natural Science Foundation Youth Project (Grant no. 82170106).

Introduction

Lung cancer is considered as one of the most lethal tumors, having the most increased incidence rate among tumors, with the highest mortality rate worldwide. Lung cancer remains the leading cause of cancer-related mortality, ranking first in percentage due to cancer in 2020 (1). According to the pathological type, lung cancer can be divided into small cell lung cancer and non-small cell lung cancer (NSCLC), of which NSCLC accounts for 80% of all, and lung adenocarcinoma (LUAD) accounts for the majority of NSCLC. The majority of patients with NSCLC, patients with LUAD in particular, exhibit symptoms not earlier than the middle or late stages of the disease, since the etiology remains unclear and early symptoms are not evident. In spite of several advancements being made in the treatment of LUAD, the average overall survival of patients with LUAD is limited to <5 years (2). Therefore, it is of utmost urgency to further identify novel key molecules for the development of novel therapeutic targets.

Several LUAD molecular markers have been identified in previous studies (3–7); however, a single gene cannot accurately represent the characteristics of LUAD due to its complex pathophysiology. Unlike the differential expression analysis that focuses on a single gene, co-expression network analysis provides new insight into understanding the pathogenesis of diseases and opportunities for therapeutic intervention by unsupervised identification of co-expressed gene modules (8,9). It has been successfully applied to the study of various biological processes, including chronic obstructive pulmonary disease and cancer, and has been proven to be quite effective in identifying candidate biological markers and therapeutic targets (9,10).

Currently, several studies have identified genes that are closely associated with LUAD development through comprehensive analysis of single or multiple microarray datasets in the currently available in the Gene Expression Omnibus (GEO) database. For example, Dong et al (11) identified aurora kinase A (AURKA) and DNA topoisomerase II alpha (TOP2A) as the two genes with the highest lymph node stage (N), which may be targets for the diagnosis and treatment of LUAD. Zhang et al (12) observed mitotic spindle-related features that may be used as independent prognostic indicators for patients with LUAD. Wang et al (13) observed that TOP2A may be one of the key protein-coding genes for LUAD possibly serving as a biomarker and therapeutic target for LUAD. Li et al (14) suggested that eight genes, including TOP2A, marker of proliferation Ki-67 (MKI67), platelet and endothelial cell adhesion molecule 1 (PECAM1), CDK1, secreted phosphoprotein 1 (SPP1), checkpoint kinase 1 (CHEK1), cyclin B1 (CCNB1), and ribonucleotide reductase regulatory subunit M2 (RRM2) may be novel pivotal genes closely associated with the progression and prognosis of LUAD. Wang et al (15) revealed that CCNB1, BUB1 mitotic checkpoint serine/threonine kinase B (BUB1B), cell division cycle 20 (CDC20), TTK protein kinase (TTK) and mitotic arrest deficient 2 like 1 (MAD2L1) may be potential targets for the treatment of LUAD. Chen et al (16) demonstrated that 10 gene targets including CDK1 and CDC20 were associated with a poor prognosis of patients with lung cancer. Fan et al (17) suggested that TOP2A, G protein-coupled receptor kinase 5 (GRK5), sirtuin 7 (SIRT7), minichromosome maintenance complex component 7 (MCM7), EGFR and collagen type I alpha 2 chain (COL1A2) may be used as predictors for the diagnosis of LUAD. Guo et al (18) proposed that TOP2A and UBE2C were independent prognostic factors for LUAD. Regardless of the abundance of studies on this topic, the mechanisms responsible for the development of LUAD remain unclear and have not yet been systematically studied, with further studies required.

In the present study, the gene expression profile dataset, GSE140797, was acquired from the GEO database, containing gene expression data from 14 samples, including seven normal lung and seven LUAD tissues for analysis. Following normalized data preprocessing, the differentially expressed genes (DEGs) between the two sample sets were analyzed. Concurrently, weighted gene co-expression network analysis (WGCNA) was performed to construct a gene co-expression network of LUAD and identify co-expression modules. Subsequently, eight cancer tissue and eight adjacent tissue samples were collected from patients with LUAD and reverse transcription-quantitative PCR (RT-qPCR) and western blot analysis were performed, in order to verify the WGCNA analysis, and the expression analysis of the three key genes, AURKA, TOP2A and maternal embryonic leucine zipper kinase (MELK), was evaluated.

AURKA is a cyclin whose activation is required for the process of cell division through the regulation of mitosis. The ectopic overexpression of the AURKA gene results in the inactivation of the G2-phase DNA damage checkpoint and the mitotic spindle assembly checkpoint, as well as tetraploid and centrosome expansion, particularly in cells with defective p53-dependent DNA damage checkpoints upstream of AURKA. At the transporter level, the EGF-induced expression of AURKA is dependent on the interaction of nuclear EGFR and STAT5. At the downstream end of AURKA, certain substrates of AURKA play critical inhibitory roles, with p53 and large tumor suppressor kinase 2 being the most important substrates of AURKA. AURKA substrates have received widespread attention as tumor suppressors (19).

TOP2A has been demonstrated to be related to the progression of several cancer types, such as hepatocellular carcinoma (20), breast cancer (21), bladder cancer (22), ovarian cancer (23), cervical cancer (24), pancreatic cancer (25), stomach cancer (26), including NSCLC (27,28).

Increased expression of MELK has been observed in various cancer cells and tissues, playing a crucial and critical role in the proliferation and self-renewal of progenitor and tumor stem cells and is overexpressed in LUAD, increasing the probability of tumorigenesis. Among them, MELK increases the proliferation of cervical, breast, colorectal and pancreatic cancer cell lines (29), while it is also involved in and affects the development of hepatocellular carcinoma (30) and bladder cancer (31).

Materials and methods <sec> <title>Data source and preprocessing

GSE gene expression profile data and clinical information were obtained from the GEO database at the National Center for Bioinformatics. Gene expression data from 14 samples in the GSE140797 dataset were analyzed, including seven normal lung tissue and seven LUAD tissue samples. The annotation information of the GPL13497 (Agilent-026652 Whole Human Genome Microarray 4×44Kv2) platform was used as a reference to convert the probe to the corresponding gene symbol, and the Limma software (version 3.54.2) package was used to normalize the data for further analysis.

DEG analysis

The samples were divided into the normal control and LUAD groups, and the conditions |log2FC|>1 and P<0.05 were set to screen for genes with significant differences in expression.

Data filtering

Co-expression networks were constructed using the WGCNA package in the R language. To obtain a valid co-expression network, the expression variance of each gene in all samples was calculated, and the genes with the same variance were considered for the construction of the co-expression network. Cluster analysis was performed, in order to detect and remove outliers.

Construction of gene co-expression network

Scale-free networks were constructed by selecting an appropriate weighting coefficient (soft threshold) to make the connections between genes adhere to the scale-free distribution of network connection requirements, and the correlation coefficient between genes was used to construct hierarchical clustering tree. Different branches of the clustering tree represented different gene modules, and different colors represented different gene modules. Subsequently, genes were categorized according to their expression patterns based on their weighted correlation coefficients. The genes that exhibited similar gene expression patterns were then grouped into a module, and then classified by gene expression pattern for further analysis. Lastly, by applying this coefficient, the correlation matrix was converted into an adjacency matrix, which was then converted into a topological overlap matrix.

Module and clinical feature correlation analysis

The Pearson's correlation coefficients and P-values of the matrices composed of gene and sample and clinical correlations per module were calculated using WGCNA, and the Pearson's correlation coefficients were used to measure the correlation between different modules and clinical traits, and the module with the highest correlation coefficient was used in subsequent analysis. The correlation between gene expressed in the module and the phenotype [gene significance (GS)] and the correlation between gene expressed in the module and the module membership (MM) were analyzed, and the genes were screened according to GS >0.8 and MM >0.8.

Functional enrichment analysis

The cross section of modules with the highest correlation between WGCNA and DEGs were selected, and Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were performed on this part of the gene set using the R package cluster profiler (https://www.bioconductor.org/packages/release/bioc/html/clusterProfiler.html).

Construction of protein-protein interaction (PPI) networks

The STRING database (https://string-db.org/) was used to select intersecting genes to construct the PPI network. PPI pairs in the network were visualized with a combined confidence score of ≥0.4. Hub genes in the PPI network were identified using cytohubba, a plug-in for Cytoscape software (version 3.7.2. http://cytoscape.org/) that identifies the top 10 hub genes.

Verification of the central gene

The Gene Expression Profiling Interactive Analysis Database (http://gepia.cancer-pku.cn/) (32) is an online analysis tool which can be used to validate the top 10 central genes selected through protein-protein interaction networks, which are based on The Cancer Genome Atlas (TCGA) of Lung Adenocarcinoma (33) and the Genotype-Tissue Expression (GTEx) LUAD database, which provides differential expression analysis, profiling, and survival analysis for central gene expression analysis, receiver operating characteristic (ROC) curve analysis, and survival analysis.

Collection and processing of clinical tissue samples

A total of eight fresh frozen clinical samples were obtained from lung adenocarcinoma patients in Renmin Hospital of Wuhan University. In addition, three male and five female patients, ranging in age from 51 to 80 years, were recruited between December 14 and December 28, 2020. The specific age, sex, and disease stage were i) male 70 years old, 2020.12.14, IIB stage; ii) male 63 years old, 2020.12.16, IA2 stage; iii) female 62 years old, 2020.12.16, IA stage; iv) female 59 years old, 2020.12.16, A stage; v) male 51 years old, 2020.12.17, A stage; vi) female 80 years old, 2020.12.24, IA3 stage; vii) Female 73 years old, 2020.12.25, IA stage; viii) female 73 years old, 2020.12.28, IA stage). The samples were obtained with patient consent and ethical approval (approval no.WDRY2022-K231) from Renmin Hospital of Wuhan University (Wuhan, China).

RT-qPCR

RNA was obtained from frozen fresh samples of lung cancer and normal paracancerous lung tissue from eight lung adenocarcinoma patients. RNA extraction was conducted using TRIzol^® reagent (cat. no. 15596026, Invitrogen; Thermo Fisher Scientific, Inc.) and reverse transcribed into cDNA using the PrimeScript RT Reagent kit according to the manufacturer's instructions (cat. no. RR037A; Takara Bio, Inc.). Candidate primers for each gene were designed using Premier 5 design program (PREMIER Biosoft). PCR reaction was performed with the quantitative TB Green-based PCR kit (cat. no. RR420A; Takara Bio, Inc.) using a CFX Connect PCR machine (CFX Connect TM; Bio-Rad Laboratories, Inc.). The following conditions were applied: Pre-denaturation stage: 95°C, 1 min for 1 cycle; amplification stage: denaturation at 95°C, 5 sec and annealing at 58°C, 30 sec, 40 cycles; melting curve stage: 65°C to 95°C, increment 0.5°C for 5 second. The results were analyzed using the 2^−ΔΔCq method (34), and the primer pair sequences for each gene are listed in Table I.

Western blot analysis

Western blot analysis of relative protein expression levels was performed as described as follows: Lung adenocarcinoma and parapulmonary carcinoma were lysed with RIPA (cat. no. P0013B; Beyotime Institute of Biotechnology) buffer to extract total proteins, and the protein concentrations were then detected using a BCA kit (cat. no. P0012S; Beyotime Institute of Biotechnology). The protein samples were denatured in a dry heater at 95°C and subsequently subjected to electrophoresis; 10% SDS gel (cat. no. P0012A; Beyotime Institute of Biotechnology) was used for electrophoresis and 25 µg of protein was loaded in each strip Following electrophoresis, the separated proteins were transferred to polyvinylidene difluoride membranes (cat. no. FFP2; Beyotime Institute of Biotechnology) by the wet transfer membrane method. Non-specific proteins on the membrane were blocked for 1 h at room temperature and then incubated with primary monoclonal antibodies corresponding to the proteins overnight at 4°C. The antibodies used are as follows: A rabbit anti-AURKA polyclonal antibody (cat. no. A15728), a rabbit anti-BUB1 mitotic checkpoint serine/threonine kinase (BUB1) polyclonal antibody (cat. no. A1929), a rabbit anti-CCNB1 polyclonal antibody (cat. no. A16800), a rabbit anti-CDK1 polyclonal antibody (cat. no. A0220), a rabbit anti-MELK monoclonal antibody (cat. no. A3530), a rabbit anti-nucleolar and spindle associated protein 1 (NUSAP1) polyclonal antibody (cat. no. A16000), a rabbit anti-TOP2A polyclonal antibody (cat. no. A16440) and a mouse monoclonal antibody for β-actin (cat. no. AC004) (all from ABclonal Biotech Co., Ltd. and all at 1:1,000).

The following day, the membranes were incubated for 1 h at room temperature using the corresponding secondary antibody; Goat Anti-Rabbit IgG H&L (HRP; cat. no. ab205719)and Goat Anti-Mouse IgG H&L (HRP; cat. no. ab205719 all from Abcam and all at 1:10,000. This was followed by a brief incubation with ECL Western Blotting Detection Reagent (cat. no. P0018S; Beyotime Institute of Biotechnology) and a final exposure with an iBright imaging system (Thermo Fisher Scientific, Inc.). Density measurement was by ImageJ (version V1.8.0.112; National Institutes of Health).

Statistical analysis

For the statistical calculations, the R (version 3.6) and WGCNA packages were used. The calculation of the correlation coefficient between the relevant clinical characteristics of LUAD tissue and the ME of each co-expression module used in this article was based on the R language platform Rstudio (version 8.9.173593; http://support-rstudio-com.netlify.app/products/rstudio/download/). WGCNA was used to identify genes with similar functions. For each gene pair, WGCNA determines the likelihood of association by using a soft threshold. A weighted network of co-expression was formed based on this concept. The data are expressed as the mean ± SE. Parametric data were analyzed using the Student's paired t-test and non-parametric data were analyzed using the Mann-Whitney U test. P<0.05 was considered to indicate a statistically significant difference.

Results <sec> <title>Data filtering

A co-expression network was constructed by including 5,435 genes with 25% of the maximum variation in the present study. No significant outliers were observed by building hierarchical clustering trees for 5,435 genes from 14 lung tissue samples. A total number of 580 DEGs were identified in the dataset (Fig. 1), among which 254 genes were downregulated and 326 genes were upregulated.

Construction of the gene co-expression network module

According to the non-scale network distribution fitting, a value of 20 was selected as the soft threshold (β value) for this dataset and a co-expression network was constructed (Fig. 2) for module identification using the dynamic cut tree method, finally acquiring 10 modules (Fig. 3A).

Correlation analysis of modules and clinical characteristics

By applying the correlation analysis of each module using sample clinical information, the green module presented with the highest positive correlation, and the blue module the highest degree of negative correlation with LUAD (Fig. 3B).

Identification and analysis of pivotal genes

According to the criteria of GS >0.8 and MM >0.8 to screen the key genes in the blue module and the green module for the following research stage, 845 and 285 key genes were selected from the blue and green modules, respectively. Subsequently, GO function enrichment analysis and KEGG enrichment analysis were performed on the 845 genes selected from blue module and the 285 genes selected from green module (Fig. 4A and B). As regards the green module, GO functional enrichment analysis revealed that common pathogenic genes were mainly enriched in mitotic cell cycle phase transition, cell cycle phase transition and cytoplasmic division, whereas in the blue module, the common pathogenic genes were mainly enriched in blood vessel development, blood vessel morphogenesis and angiogenesis (Fig. 4C). KEGG pathway analysis mainly demonstrated enrichment in the cell cycle, p53 signaling pathway and Fanconi anemia pathway in the green module, and proteoglycans in cancer, alcoholism and axon guidance in the blue module (Fig. 4D).

PPI network construction and analysis

The 845 genes from the blue module and 580 differentially expressed genes were intersected, in order to obtain 324 genes. Similarly, the 285 genes from the green module and 580 differential genes were intersected to obtain 107 genes. The two PPI networks for the aforementioned 324 and 107 genes were then respectively established using Cytoscape software (Fig. 5), and 10 key genes were selected from the two PPI networks, respectively according to the degree of connectivity, including AURKA, BUB1, CCNB1, CDC45, CDK1, MELK, NUSAP1, PBK, TOP2A, TTK, BDKRB2, CCL19, CX3CR1, CXCL13, CXCL9, CXCR4, CXCR5, GNAI1, GNG11 and NMUR1. Among the genes, BDKRB2, CCL19, CX3CR1, CXCL13, CXCL9, CXCR4, CXCR5, GNAI1, GNG11 and NMUR1 were selected from the blue module, with AURKA, BUB1, CCNB1, CDC45, CDK1, MELK, NUSAP1, PBK, TOP2A and TTK selected from the green module.

Verification of the expression of the 20 selected genes in TCGA database

Subsequently, the expression profiles of 59 normal lung tissues and 515 LUAD tissues were acquired from TCGA database to verify the expression of the aforementioned 20 key genes. With the exception of the expression of CXCR4 among the 20 genes, the expression of the remaining 19 genes differed significantly between normal lung tissue and LUAD tissues (Fig. 6).

ROC curve analysis

Subsequently, ROC curve analysis was performed on the 19 genes verified in TCGA database, and it was observed that apart from BDKRB2, CCL19, CXCR5, CXCL9, GNAL1 and CX3CR1, and the other 13 genes had AUCs >0.9 (Fig. 7) and were considered in the following stages of the analysis.

Survival analysis

Subsequently, survival analysis using the 13 genes was performed by GEPIA and it was determined that the P-value of eight genes was <0.05, including AURKA, BUB1, CCNB1, CDK1, MELK, NUSAP1, PBK and TOP2A (Fig. 8), indicating that they may be key genes that reduce lung adenocarcinoma survival and affect prognosis and were included in the following analysis.

Gene expression in human LUAD and normal paracancerous tissues

To validate the results of bioinformatics analysis, the expression levels of the aforementioned eight genes were verified in human LUAD tissues and paired lung paracancerous tissues using RT-qPCR and western blot analysis. The relative mRNA expression levels of seven out of eight genes, namely AURKA, BUB1, CCNB1, CDK1, MELK, NUSAP1 and TOP2A, were significantly higher in the LUAD than in the adjacent normal lung tissues (Fig. 9). The protein levels of three out of these seven overexpressed genes, including AURKA, MELK and TOP2A, were significantly higher in the LUAD than in adjacent normal lung tissues (Fig. 10).

Discussion

Lung cancer is one of the most prevalent types of cancer and currently presents with the highest mortality rate. Among patients recently diagnosed with lung cancer, the 5-year survival rate following diagnosis has been observed to be extremely reduced in the majority of countries, with a survival rate of only 1/10 to 1/5 (35). Ηowever, the molecular mechanisms underlying LUAD remain poorly understood. Without early diagnosis, the majority of patients are not treated promptly, resulting in a very poor prognosis. Therefore, there is an urgent need for the identification of efficient biomarkers for the early detection and treatment of lung cancer. The screening of early biomarkers and key genes for malignant and benign diseases using bioinformatics analysis has been proven a very efficient method (36–39). However, the procedure of data analysis in a scientifically sound and efficient manner is currently a serious hindrance. In the present study, the information extracted from a high-throughput gene expression dataset was analyzed, firstly sorting the differentially expressed genes, and WGCNA was then used to obtain the genes in the modules with the highest correlation with the clinical phenotype. Subsequently, PPI and correlation analyses were performed on the common pathogenic genes of the two analyses.

Several inhibitors with high specificity for AURKA have been developed with clinical efficacy, including MLN8237 and ENMD-2076 (40). Moreover, cell cycle inhibition by regulating the AURKA/ polo-like kinase 1 (PLK1) pathway has been reported to induce apoptosis in LUAD (41), with AURKA not only being a potential biomarker for predicting the poor prognosis of smoking-related LUAD. Furthermore, the AURKA rs1047972 variant has been found to be significantly associated with EGFR mutation in patients with LUAD, particularly in women and non-smoking patients. The AURKA variant may contribute to the pathologic development of LUAD (42–44). The AURKA-induced amplification or activation of liver kinase B1 (LKB1)/AMPK signaling pathway impairment contributes to the initiation and progression of NSCLC, suggesting that AURKA may be a potential therapeutic target against AURKA-driven overactive LUAD (45).

Chemotherapy resistance research has emerged as a major challenge in cancer treatment. Currently, resistance to radiation therapy in LUAD has been attributed to elevated levels of autophagy and thus resistance, and AURKA is critical for the reduction chemotherapy resistance in LUAD, as evidenced by high levels of AURKA expression associated with chemoresistance and proliferation in LUAD. Genetic resistance in response to chronic EGFR inhibition attenuates drug-induced apoptosis, and silencing AURKA reduces drug resistance in EGFR-mutant LUAD (46,47).

It has been reported that TOP2A expression levels are upregulated in both surgically resected lung cancer tissues and lung cancer cell lines. As previously demonstrated, the knockdown of TOP2A in human lung cancer cell lines inhibited cell proliferation, migration and invasion, while the inhibition of TOP2A reduced the expression levels of CCNB1 and CCNB2. High expression of TOP2A has been reported to significantly increase the risk of mortality in patients with NSCLC, a risk that is particularly pronounced in patients with LUAD, and its molecular mechanism is associated with activation of PI3K/AKT and Wnt/β-catenin signaling pathways, which promote apoptosis. Etoposide, which targets TOP2A, has been approved for the treatment of small cell lung cancer, but there are currently no drugs for LUAD (48,49). Through various bioinformatics approaches, TOP2A has been identified as an independent factor affecting the prognosis of patients with LUAD (50–53), whereas an increased TOP2A expression has also been identified as a potential risk factor for pathological stage I LUAD (54). Ciclopirox olamine and quercetin have also been demonstrated to exert tumor-suppressive effects via TOP2A in LUAD (55,56).

MELK is highly expressed in LUAD, and the increased expression of MELK has been associated with a poor prognosis; MELK may serve as a potential diagnostic marker and therapeutic target for LUAD. The molecular mechanisms by which MELK affects cancer include the possibility of the kinase activity of MELK affecting lung adenocarcinogenesis by inhibiting the pro-apoptotic function of Bcl-GL. High levels of MELK expression have been associated with high-grade tumors, an increased aggressiveness, a poorer patient prognosis and radioresistance, and an increased expression of MELK is associated with TOP2A, CDK1 and AURKB (57). Various MELK inhibitors have been developed as potential cancer therapeutic agents, molecules, including OTS and MELK-T1 have demonstrated efficacy in experimental animals to delay the proliferation of cancer cells (58).

It has been reported that TOP2A interacts directly with MELK, CDC20, CCNB2, UBE2T, KIAA0101 and TK1 through a PPI network (11). However, this cannot systematically reflect the interaction pattern between key pathogenic genes in LUAD. In the present study, bioinformatics analysis of LUAD using WGCNA and validation by human tissue samples yielded three key genes, AURKA, MELK and TOP2A, whose co-expression may be important for early diagnosis and prognosis as well as further elucidation of the pathogenesis of LUAD.

Acknowledgements

Not applicable.

Availability of data and materials

The datasets used and/or analyzed during the present study are available from the corresponding author on reasonable request.

Authors' contributions

XZ and YX contributed significantly to the concept and design of the study. XZ and SW conducted bioinformatics experiments and obtained data. HL, RH and MZ conducted confirmatory experiments and obtained data. XZ and JR and LC analyzed the data. XZ, YX and MZ drafted the manuscript. YX, SW and RH critically modify the important intellectual content of the study. XZ and YX confirm the authenticity of all the raw data. BX and NZ and WS contributed to the collection and collation of clinical samples from lung adenocarcinoma patients. All authors have read and approved the final manuscript and have agreed to take responsibility for all aspects of the work.

Ethics approval and consent to participate

The present study was approved (approval no. WDRY2022-K231) by Renmin Hospital of Wuhan University (Wuhan, China) and written informed consent was obtained from patients in all cases.

Patient consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

References 1

Sung

Ferlay

Siegel

Laversanne

Soerjomataram

Jemal

Bray

Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

CA Cancer J Clin712092492021

10.3322/caac.21660

33538338

Denisenko

Budkevich

Zhivotovsky

Cell death-based treatment of lung adenocarcinoma

Cell Death Dis91172018

10.1038/s41419-017-0063-y

29371589

Wang

Zheng

FAM72 serves as a biomarker of poor prognosis in human lung adenocarcinoma

Aging (Albany NY)13815581762021

10.18632/aging.202625

33686947

Wang

Tan

Liu

Lei

Qiao

Liu

Cheng

Wei

Peng

RNA-Seq profiling of circular RNA in human lung adenocarcinoma and squamous cell carcinoma

Mol Cancer181342019

10.1186/s12943-019-1061-8

31484581

Zhang

Wang

Zhao

Liu

Zhao

Gao

Zang

Jia

Identification of a novel prognosis-associated ceRNA network in lung adenocarcinoma via bioinformatics analysis

Biomed Eng Online201172021

10.1186/s12938-021-00952-x

34819106

Guo

Cai

TCN1 is a potential prognostic biomarker and correlates with immune infiltrates in lung adenocarcinoma

World J Surg Oncol20832022

10.1186/s12957-022-02556-8

35287670

Yuanhua

Pudong

Wei

Yuan

Delin

Yan

Geyu

TFAP2A Induced KRT16 as an Oncogene in Lung Adenocarcinoma via EMT

Int J Biol Sci15141914282019

10.7150/ijbs.34076

31337972

Liu

Reverse engineering of genome-wide gene regulatory networks from gene expression data

Curr Genomics163222015

10.2174/1389202915666141110210634

25937810

Langfelder

Horvath

WGCNA: An R package for weighted correlation network analysis

BMC Bioinformatics95592008

10.1186/1471-2105-9-559

19114008

Liu

Fang

Shen

Yao

Fang

Chen

Feng

Zeng

Identification of surrogate prognostic biomarkers for allergic asthma in nasal epithelial brushing samples by WGCNA

J Cell Biochemi120513751502019

10.1002/jcb.27790

30304558

Dong

Men

Yang

Identification of lung adenocarcinoma biomarkers based on bioinformatic analysis and human samples

Oncol Rep43143714502020

32323809

Zhang

Zhu

Zhao

Yan

Jiang

Zhao

Fan

Identification of a panel of mitotic spindle-related genes as a signature predicting survival in lung adenocarcinoma

J Cell Physiol235436143752020

10.1002/jcp.29312

31637715

Wang

Tang

Liu

Jiao

Liu

Identification of differentially expressed protein-coding genes in lung adenocarcinomas

Exp Ther Med19110311112020

32010276

Liu

Cui

Han

Comprehensive analysis of candidate diagnostic and prognostic biomarkers associated with lung adenocarcinoma

Med Sci Monit26e9220702020

32578582

Wang

Zhou

Chen

Zhou

Chu

Identification of key genes and biological pathways in lung adenocarcinoma via bioinformatics analysis

Mol Cell Biochem4769319392021

10.1007/s11010-020-03959-5

33130972

Chen

Tang

Han

Zuo

Cai

Song

Evaluation of clinical value and potential mechanism of MTFR2 in lung adenocarcinoma via bioinformatics

BMC Cancer216192021

10.1186/s12885-021-08378-3

34039308

Fan

Wang

Tang

Extracting predictors for lung adenocarcinoma based on Granger causality test and stepwise character selection

BMC Bioinformatics20(Suppl 7)S1972019

10.1186/s12859-019-2739-z

Guo

Sun

Guo

Song

Xue

Zhang

Wang

Qiu

Tan

Elevated TOP2A and UBE2C expressions correlate with poor prognosis in patients with surgically resected lung adenocarcinoma: A study based on immunohistochemical analysis and bioinformatics

J Cancer Res Clin Oncol1468218412020

10.1007/s00432-020-03147-4

32103339

Huang

Liu

Dong

Targeting AURKA in Cancer: Molecular mechanisms and opportunities for Cancer therapy

Mol Cancer20152021

10.1186/s12943-020-01305-3

33451333

Meng

Wei

Deng

Study on the expression of TOP2A in hepatocellular carcinoma and its relationship with patient prognosis

Cancer Cell Int22292022

10.1186/s12935-021-02439-0

35033076

Zhang

Yang

Wang

Zhou

Zhang

Xiao

Xue

TOP2A correlates with poor prognosis and affects radioresistance of medulloblastoma

Front Oncol129189592022

10.3389/fonc.2022.918959

35912241

Zhang

MiR-599 targeting TOP2A inhibits the malignancy of bladder cancer cells

Biochem Biophys Res Commun5701541612021

10.1016/j.bbrc.2021.06.069

34284141

Gao

Zhao

Ren

Chen

Yin

Yue

TOP2A promotes tumorigenesis of high-grade serous ovarian cancer by regulating the TGF-β/Smad pathway

J Cancer11418141922020

10.7150/jca.42736

32368301

Wang

Shen

Zou

Huang

Xia

Gao

Huang

TOP2A promotes cell migration, invasion and epithelial-mesenchymal transition in cervical cancer via activating the PI3K/AKT signaling

Cancer Manag Res12380738142020

10.2147/CMAR.S240577

32547216

Pei

Yin

Liu

TOP2A induces malignant character of pancreatic cancer through activating β-catenin signaling pathway

Biochim Biophys Acta Mol Basis Dis18641972072018

10.1016/j.bbadis.2017.10.019

29045811

Chen

Shi

E2F1-mediated up-regulation of TOP2A promotes viability, migration, and invasion, and inhibits apoptosis of gastric cancer cells

J Biosci47842022

10.1007/s12038-022-00322-2

36550695

Xue

Wang

Expression of the topoisomerase II alpha (TOP2A) gene in lung adenocarcinoma cells and the association with patient outcomes

Med Sci Monit26e9291202020

10.12659/MSM.929120

33361736

Chen

Guo

Song

Liu

SKA1/2/3 serves as a biomarker for poor prognosis in human lung adenocarcinoma

Transl Lung Cancer Res92182312020

10.21037/tlcr.2020.01.20

32420061

Gray

Jubb

Hogue

Dowd

Kljavin

Bai

Frantz

Zhang

Koeppen

Maternal embryonic leucine zipper kinase/murine protein serine-threonine kinase 38 is a promising therapeutic target for multiple cancers

Cancer Res65975197612005

10.1158/0008-5472.CAN-04-4531

16266996

Chen

Xie

Dong

Gao

Deng

Wang

MicroRNA-214-3p inhibits proliferation and cell cycle progression by targeting MELK in hepatocellular carcinoma and correlates cancer prognosis

Cancer Cell Int171022017

10.1186/s12935-017-0471-1

29151817

Chen

Zhou

Guo

Wang

Liu

Xiao

Wang

Inhibition of MELK produces potential anti-tumour effects in bladder cancer by inducing G1/S cell cycle arrest via the ATM/CHK2/p53 pathway

J Cell Mol Med24180418212020

10.1111/jcmm.14878

31821699

Tang

Kang

Gao

Zhang

GEPIA: A web server for cancer and normal gene expression profiling and interactive analyses

Nucleic Acids Res45W98W1022017

10.1093/nar/gkx247

28407145

Cancer Genome Atlas Research Network

Weinstein

Collisson

Mills

Shaw

Ozenberger

Ellrott

Shmulevich

Sander

Stuart

The Cancer Genome Atlas Pan-Cancer analysis project

Nat Genet45111311202013

10.1038/ng.2764

24071849

Livak

Schmittgen

Analysis of relative gene expression data using real-time quantitative PCR and the 2(−Delta Delta C(T)) method

Methods254024082001

10.1006/meth.2001.1262

11846609

Allemani

Matsuda

Di Carlo

Harewood

Matz

Nikšić

Bonaventure

Valkov

Johnson

Estève

Global surveillance of trends in cancer survival 2000–14 (CONCORD-3): Analysis of individual records for 37 513 025 patients diagnosed with one of 18 cancers from 322 population-based registries in 71 countries

Lancet391102310752018

10.1016/S0140-6736(17)33326-3

29395269

Zhao

Zhang

Wang

Zhang

Song

You

Identification of key biomarkers and immune infiltration in systemic lupus erythematosus by integrated bioinformatics analysis

J Transl Med19352021

10.1186/s12967-020-02698-x

33468161

Bao

Huang

Zhou

Zheng

Identification and validation of novel biomarkers for diagnosis and prognosis of hepatocellular carcinoma

Front Oncol105414792020

10.3389/fonc.2020.541479

33102213

Yang

Wang

Wei

Peng

Wang

Kong

Candidate biomarkers and molecular mechanism investigation for glioblastoma multiforme utilizing WGCNA

Biomed Res Int201842467032018

10.1155/2018/4246703

30356407

Vernocchi

Gili

Conte

Del Chierico

Conta

Miccheli

Botticelli

Paci

Caldarelli

Nuti

Network analysis of gut microbiome and metabolome to discover microbiota-linked biomarkers in patients affected by non-small cell lung cancer

Int J Mol Sci2187302020

10.3390/ijms21228730

33227982

Otto

Sicinski

Cell cycle proteins as promising targets in cancer therapy

Nat Rev Cancer17931152017

10.1038/nrc.2016.138

28127048

Zhang

Zhou

Wang

Yin

Ding

Zhang

Tanshinone IIA suppresses the progression of lung adenocarcinoma through regulating CCNA2-CDK2 complex and AURKA/PLK1 pathway

Sci Rep11236812021

10.1038/s41598-021-03166-2

34880385

Zhong

Shi

Wang

Liu

Wang

Silencing Aurora-A with siRNA inhibits cell proliferation in human lung adenocarcinoma cells

Int J Oncol49102810382016

10.3892/ijo.2016.3605

27571708

Zhang

Liu

Elevated mRNA Levels of AURKA, CDC20 and TPX2 are associated with poor prognosis of smoking related lung adenocarcinoma using bioinformatics analysis

Int J Med Sci15167616852018

10.7150/ijms.28728

30588191

Yang

Hsieh

Lee

Yen

Wang

Chiang

Liu

Tsao

Lee

Yang

Impact of aurora kinase a polymorphism and epithelial growth factor receptor mutations on the clinicopathological characteristics of lung adenocarcinoma

Int J Environ Res Public Health1773502020

10.3390/ijerph17197350

33050100

Zheng

Chi

Zhi

Zhang

Yue

Zhao

Gao

Guo

Aurora-A-mediated phosphorylation of LKB1 compromises LKB1/AMPK signaling axis to facilitate NSCLC growth and migration

Oncogene375025112018

10.1038/onc.2017.354

28967900

Shah

Bhatt

Rotow

Rohrberg

Olivas

Wang

Hemmati

Martins

Maynard

Kuhn

Aurora kinase A drives the evolution of resistance to third-generation EGFR inhibitors in lung cancer

Nat Med251111182019

10.1038/s41591-018-0264-7

30478424

Gao

Yan

Wang

Xia

Wang

Chang

The role of radiotherapy-related autophagy genes in the prognosis and immune infiltration in lung adenocarcinoma

Front Immunol139926262022

10.3389/fimmu.2022.992626

36311724

Grenda

Błach

Szczyrek

Krawczyk

Nicoś

Kuźnar Kamińska

Jakimiec

Balicka

Chmielewska

Batura-Gun

Promoter polymorphisms of TOP2A and ERCC1 genes as predictive factors for chemotherapy in non-small cell lung cancer patients

Cancer Med96056142020

10.1002/cam4.2743

31797573

Kou

Sun

Zhang

Wang

Yang

TOP2A promotes lung adenocarcinoma cells' malignant progression and predicts poor prognosis in lung adenocarcinoma

J Cancer11249625082020

10.7150/jca.41415

32201520

Zeng

Song

Huang

Stemness related genes revealed by network analysis associated with tumor immune microenvironment and the clinical outcome in lung adenocarcinoma

Front Genet115492132020

10.3389/fgene.2020.549213

33193623

Song

Tang

Identification of KIF4A and its effect on the progression of lung adenocarcinoma based on the bioinformatics analysis

Biosci Rep41BSR202039732021

10.1042/BSR20203973

33398330

Dai

Zhou

Wang

Identification of crucial genes associated with lung adenocarcinoma by bioinformatic analysis

Medicine (Baltimore)99e230522020

10.1097/MD.0000000000023052

33126397

Zhang

Pang

Feng

Zeng

Transcriptomic data exploration of consensus genes and molecular mechanisms between chronic obstructive pulmonary disease and lung adenocarcinoma

Sci Rep12132142022

10.1038/s41598-022-17552-x

35918384

Deng

Chen

Huang

Song

Feng

Chen

Zhou

Screening and validation of significant genes with poor prognosis in pathologic Stage-I lung adenocarcinoma

J Oncol202237940212022

10.1155/2022/3794021

35444699

Yin

Che

Jiang

Zhou

Liu

Yan

Ciclopirox olamine exerts tumor-suppressor effects via topoisomerase II alpha in lung adenocarcinoma

Front Oncol127919162022

10.3389/fonc.2022.791916

35251970

Zhang

Guo

A new risk model based on 7 quercetin-related target genes for predicting the prognosis of patients with lung adenocarcinoma

Front Genet138900792022

10.3389/fgene.2022.890079

35646063

Zhou

Yan

Zhu

Liu

Maternal embryonic leucine zipper kinase enhances gastric cancer progression via the FAK/Paxillin pathway

Mol Cancer131002014

10.1186/1476-4598-13-100

24885567

McDonald

Graves

Enigmatic MELK: The controversy surrounding its complex role in cancer

J Biol Chem295819582032020

10.1074/jbc.REV120.013433

32350113

Figure 1.

Normalization of gene expression and gene differential expression of data between two groups of samples. (A) Standardization of data. The blue bars represent the data before normalization, and the red bars represent the data following normalization. (B) Principal component analysis of two groups of sample data. (C) Differential expression of data between the two groups of samples. Red dots indicate upregulated genes and green dots indicate downregulated genes (|fold change|>2.0, adj-P<0.05). Gray dots indicate genes with no significant difference in expression.

Figure 2.

WGCNA analysis of the data. (A) Determination of the optimal soft thresholding power. (B) Construction of co-expression matrix and module visualization. WGCNA weighted gene co-expression network analysis. WGCNA, weighted gene co-expression network analysis.

Figure 3.

Module correlation analysis. (A) Correlation analysis between modules. The poor correlation between modules indicates that the module division is successful. (B) Correlation analysis between modules and diseases. The green module negatively correlated with disease (R²=0.91; P=5×10⁻⁶), while the blue module positively correlated with disease (R² =0.95; P=2×10⁻⁷).

Figure 4.

Correlation analysis between genes and traits for the green and blue modules. (A) Module membership vs. gene significance map of genes for green module. Hub genes with GS >0.8 and MM >0.8 were selected. (B) Module membership vs. gene significance map of genes for the blue module. Hub genes with GS >0.8 and MM >0.8 were selected. (C) The top 10 biological processes of hub gene enrichment analysis for the green module and blue modules. (D) Hub gene KEGG enrichment analysis of the top 10 pathways for the green and blue modules. GS, gene significance; MM, module membership; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Figure 5.

PPI analysis of all Hub genes in the green and blue modules. (A) PPI analysis of Hub gene for the green module. Red, yellow and green are the top three protein-protein interaction subnetworks respectively. (B) PPI analysis of hub genes for the blue module. Red, yellow and green are the top three PPI subnetworks respectively. PPI, protein-protein interaction.

Figure 6.

In total, 59 control samples and 504 cancer samples were derived from TCGA database for verification. From the 10 genes selected from the two modules, a total of 20 genes were verified. A total of 19 of these changes were verified, and the other one (CXCR4) was excluded. (A) The expression of AURKA, BUB1, CCNB1, CDC45, CDK1, MELK, NUSAP1, PBK, TOP2A and TTK genes in normal lung tissues and lung adenocarcinoma tissues, where red indicates normal lung tissues and green indicates lung adenocarcinoma tissues. (B) The expression of BDKRB2, CCL19, CX3CR1, CXCL13, CXCL9, CXCR4, CXCR5, GNAI1, GNG11 and NMUR1 genes in normal lung tissues and lung adenocarcinoma tissues, with red indicating normal lung tissues and green indicating lung adenocarcinoma tissues. t-test of normal lung tissue and lung adenocarcinoma tissue. *P<0.05, **P<0.01 and ****P<0.0001.

Figure 7.

ROC curve analysis of 20 genes. AUCs >0.9 were included in the subsequent analysis. A total of 13 genes (AURKA, CDC45, TTK, TOP2A, CCNB1, NUSAP1, MELK, PBK, BUB1, CDK1, CXCL13, GNG11 and NMUR1) were included, and seven genes were excluded. (A) ROC curve analysis was performed for the AURKA, CDC45, TTK, TOP2A and CCNB1 genes sequentially. (B) ROC curve analysis was performed for the NUSAP1, MELK, PBK, BPKPB2 and BUB1 genes sequentially. (C) ROC curve analysis was performed for the CDK1, CCL19, CXCR5, CXCL13 and CXCL9 genes sequentially. (D) ROC curve analysis was performed for the CXCR4, GNG11, GNAL1 and NMUR1, CX3CR1 genes sequentially. ROC, receiver operating characteristic; AUC, area under the curve.

Figure 8.

Survival analysis was performed and survival curves were obtained using GEPIA database. According to the P-values, there were eight genes with P<0.05 (AURKA, TOP2A, CCNB1, NUSAP1, MELK, PBK, BUB1 and CDK1). (A-H) The survival curves of CDK1, BUB1, CCNB1, TOP2A, PBK, NUSAP1, AURKA and MELK genes in TCGA database are presented in order. TCGA, The Cancer Genome Atlas.

Figure 9.

mRNA expression of eight genes screened using WGCNA in clinical tissue samples. Clinical samples are divided into lung adenocarcinoma adjacent tissues and lung adenocarcinoma tissue samples. (A-H) In each graph, the left bar represents the lung adenocarcinoma adjacent tissues, and the right bar the lung adenocarcinoma tissues. The mRNA expression levels of AURKA, BUB1, CCNB1, CDK1, MELK, NUSAP1 and TOP2A genes in lung adenocarcinoma were higher than those in paired adjacent normal tissues. WGCNA, weighted gene co-expression network analysis.

Figure 10.

Protein expression of seven genes screened using WGCNA in clinical tissue samples. (A) Representative western blots of AURKA, BUB1, CCNB1, CDK1, MELK, NUSAP1 and TOP2A protein expression in adjacent lung adenocarcinoma tissues and lung adenocarcinoma tissue samples. (B) The quantitative analysis of the data in panel A, in which the protein levels of AURKA, TOP2A and MELK in lung adenocarcinoma tissues were higher than those in matched adjacent lung adenocarcinoma tissues. WGCNA, weighted gene co-expression network analysis.

Table I.

Oligonucleotide primers used in the present study.

Gene		Oligonucleotide primer sequence (5′-3′)
GAPDH	Sense	GGAAGCTTGTCATCAATGGAAATC
	Antisense	TGATGACCCTTTTGGCTCCC
CDK1	Sense	AAGGGTAGACACAAAACTACAGGTC
	Antisense	ATGTACTGACCAGGAGGGATAGA
TOP2A	Sense	CCTTCTATGGTGGATGGTTTGA
	Antisense	ATGGGCTGCAAGAGGTTTAGAT
MELK	Sense	GATGTTCCCAAGTGGCTCTCTC
	Antisense	TCCTCCATTGTTTGCCTGTTG
NUSAP1	Sense	CTGCTGCTGTTATTACCCCATTC
	Antisense	CTTTCTTCTCCTTTCGTTCTTGC
BUB1	Sense	GAAGAAATACCACAATGACCCAAG
	Antisense	TGGGTTTCAGTGAGGCGTGT
AURKA	Sense	TGCCCTGTCTTACTGTCATTCG
	Antisense	AAAGGAGGCTTCCCAACTAAAA
CCNB1	Sense	GCCTATTTTGGTTGATACTGCCTC
	Antisense	CTCCATCTTCTGCATCCACATC
PBK	Sense	TGACTGCTCCTGCCTTCATAAC
	Antisense	TAACACCATTCTCCTCCACAGC

CDK1, cyclin dependent kinase 1; TOP2A, DNA topoisomerase II alpha; MELK, maternal embryonic leucine zipper kinase; NUSAP1, nucleolar and spindle associated protein 1; BUB1, BUB1 mitotic checkpoint serine/threonine kinase B; AURKA, aurora kinase A; CCNB1, cyclin B1; PBK, PDZ binding kinase.