Contributed equally
The comprehensive analysis of single or multiple microarray datasets is currently available in Gene Expression Omnibus (GEO) databases, with several studies having identified genes strongly associated with the development of lung adenocarcinoma (LUAD). However, the mechanisms of LUAD development remain largely unknown and has not yet been systematically studied; thus, further studies are required in this field. In the present study, weighted gene co-expression network analysis (WGCNA) was used for the evaluation of key genes with potential high risk of LUAD, and to provide more reliable evidence concerning its pathogenesis. The GSE140797 dataset from the high-throughput GEO database was downloaded and was first analyzed using the Limma package in the R language in order to determine the differentially expressed genes. The dataset was then analyzed using the WGCNA package to analyze the co-expressed genes, and the modular genes with the highest correlation with the clinical phenotype were identified. Subsequently, the pathogenic genes shared in common between the result of the two analyses were imported into the STRING database for protein-protein interaction network analysis. The hub genes were screened out using Cytoscape, and then The Cancer Genome Atlas analysis, receiver operating characteristic analysis and survival analysis were subsequently performed. Finally, the key genes were evaluated using reverse transcription-quantitative PCR and western blot analysis. Bioinformatics analysis of the GSE140797 dataset revealed eight key genes:
Lung cancer is considered as one of the most lethal tumors, having the most increased incidence rate among tumors, with the highest mortality rate worldwide. Lung cancer remains the leading cause of cancer-related mortality, ranking first in percentage due to cancer in 2020 (
Several LUAD molecular markers have been identified in previous studies (
Currently, several studies have identified genes that are closely associated with LUAD development through comprehensive analysis of single or multiple microarray datasets in the currently available in the Gene Expression Omnibus (GEO) database. For example, Dong
In the present study, the gene expression profile dataset, GSE140797, was acquired from the GEO database, containing gene expression data from 14 samples, including seven normal lung and seven LUAD tissues for analysis. Following normalized data preprocessing, the differentially expressed genes (DEGs) between the two sample sets were analyzed. Concurrently, weighted gene co-expression network analysis (WGCNA) was performed to construct a gene co-expression network of LUAD and identify co-expression modules. Subsequently, eight cancer tissue and eight adjacent tissue samples were collected from patients with LUAD and reverse transcription-quantitative PCR (RT-qPCR) and western blot analysis were performed, in order to verify the WGCNA analysis, and the expression analysis of the three key genes, AURKA, TOP2A and maternal embryonic leucine zipper kinase (
AURKA is a cyclin whose activation is required for the process of cell division through the regulation of mitosis. The ectopic overexpression of the AURKA gene results in the inactivation of the G2-phase DNA damage checkpoint and the mitotic spindle assembly checkpoint, as well as tetraploid and centrosome expansion, particularly in cells with defective p53-dependent DNA damage checkpoints upstream of AURKA. At the transporter level, the EGF-induced expression of AURKA is dependent on the interaction of nuclear EGFR and STAT5. At the downstream end of AURKA, certain substrates of AURKA play critical inhibitory roles, with p53 and large tumor suppressor kinase 2 being the most important substrates of AURKA. AURKA substrates have received widespread attention as tumor suppressors (
TOP2A has been demonstrated to be related to the progression of several cancer types, such as hepatocellular carcinoma (
Increased expression of MELK has been observed in various cancer cells and tissues, playing a crucial and critical role in the proliferation and self-renewal of progenitor and tumor stem cells and is overexpressed in LUAD, increasing the probability of tumorigenesis. Among them, MELK increases the proliferation of cervical, breast, colorectal and pancreatic cancer cell lines (
GSE gene expression profile data and clinical information were obtained from the GEO database at the National Center for Bioinformatics. Gene expression data from 14 samples in the GSE140797 dataset were analyzed, including seven normal lung tissue and seven LUAD tissue samples. The annotation information of the GPL13497 (Agilent-026652 Whole Human Genome Microarray 4×44Kv2) platform was used as a reference to convert the probe to the corresponding gene symbol, and the Limma software (version 3.54.2) package was used to normalize the data for further analysis.
The samples were divided into the normal control and LUAD groups, and the conditions |log2FC|>1 and P<0.05 were set to screen for genes with significant differences in expression.
Co-expression networks were constructed using the WGCNA package in the R language. To obtain a valid co-expression network, the expression variance of each gene in all samples was calculated, and the genes with the same variance were considered for the construction of the co-expression network. Cluster analysis was performed, in order to detect and remove outliers.
Scale-free networks were constructed by selecting an appropriate weighting coefficient (soft threshold) to make the connections between genes adhere to the scale-free distribution of network connection requirements, and the correlation coefficient between genes was used to construct hierarchical clustering tree. Different branches of the clustering tree represented different gene modules, and different colors represented different gene modules. Subsequently, genes were categorized according to their expression patterns based on their weighted correlation coefficients. The genes that exhibited similar gene expression patterns were then grouped into a module, and then classified by gene expression pattern for further analysis. Lastly, by applying this coefficient, the correlation matrix was converted into an adjacency matrix, which was then converted into a topological overlap matrix.
The Pearson's correlation coefficients and P-values of the matrices composed of gene and sample and clinical correlations per module were calculated using WGCNA, and the Pearson's correlation coefficients were used to measure the correlation between different modules and clinical traits, and the module with the highest correlation coefficient was used in subsequent analysis. The correlation between gene expressed in the module and the phenotype [gene significance (GS)] and the correlation between gene expressed in the module and the module membership (MM) were analyzed, and the genes were screened according to GS >0.8 and MM >0.8.
The cross section of modules with the highest correlation between WGCNA and DEGs were selected, and Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were performed on this part of the gene set using the R package cluster profiler (
The STRING database (
The Gene Expression Profiling Interactive Analysis Database (
A total of eight fresh frozen clinical samples were obtained from lung adenocarcinoma patients in Renmin Hospital of Wuhan University. In addition, three male and five female patients, ranging in age from 51 to 80 years, were recruited between December 14 and December 28, 2020. The specific age, sex, and disease stage were i) male 70 years old, 2020.12.14, IIB stage; ii) male 63 years old, 2020.12.16, IA2 stage; iii) female 62 years old, 2020.12.16, IA stage; iv) female 59 years old, 2020.12.16, A stage; v) male 51 years old, 2020.12.17, A stage; vi) female 80 years old, 2020.12.24, IA3 stage; vii) Female 73 years old, 2020.12.25, IA stage; viii) female 73 years old, 2020.12.28, IA stage). The samples were obtained with patient consent and ethical approval (approval no.WDRY2022-K231) from Renmin Hospital of Wuhan University (Wuhan, China).
RNA was obtained from frozen fresh samples of lung cancer and normal paracancerous lung tissue from eight lung adenocarcinoma patients. RNA extraction was conducted using TRIzol® reagent (cat. no. 15596026, Invitrogen; Thermo Fisher Scientific, Inc.) and reverse transcribed into cDNA using the PrimeScript RT Reagent kit according to the manufacturer's instructions (cat. no. RR037A; Takara Bio, Inc.). Candidate primers for each gene were designed using Premier 5 design program (PREMIER Biosoft). PCR reaction was performed with the quantitative TB Green-based PCR kit (cat. no. RR420A; Takara Bio, Inc.) using a CFX Connect PCR machine (CFX Connect TM; Bio-Rad Laboratories, Inc.). The following conditions were applied: Pre-denaturation stage: 95°C, 1 min for 1 cycle; amplification stage: denaturation at 95°C, 5 sec and annealing at 58°C, 30 sec, 40 cycles; melting curve stage: 65°C to 95°C, increment 0.5°C for 5 second. The results were analyzed using the 2−ΔΔCq method (
Western blot analysis of relative protein expression levels was performed as described as follows: Lung adenocarcinoma and parapulmonary carcinoma were lysed with RIPA (cat. no. P0013B; Beyotime Institute of Biotechnology) buffer to extract total proteins, and the protein concentrations were then detected using a BCA kit (cat. no. P0012S; Beyotime Institute of Biotechnology). The protein samples were denatured in a dry heater at 95°C and subsequently subjected to electrophoresis; 10% SDS gel (cat. no. P0012A; Beyotime Institute of Biotechnology) was used for electrophoresis and 25 µg of protein was loaded in each strip Following electrophoresis, the separated proteins were transferred to polyvinylidene difluoride membranes (cat. no. FFP2; Beyotime Institute of Biotechnology) by the wet transfer membrane method. Non-specific proteins on the membrane were blocked for 1 h at room temperature and then incubated with primary monoclonal antibodies corresponding to the proteins overnight at 4°C. The antibodies used are as follows: A rabbit anti-AURKA polyclonal antibody (cat. no. A15728), a rabbit anti-BUB1 mitotic checkpoint serine/threonine kinase (BUB1) polyclonal antibody (cat. no. A1929), a rabbit anti-CCNB1 polyclonal antibody (cat. no. A16800), a rabbit anti-CDK1 polyclonal antibody (cat. no. A0220), a rabbit anti-MELK monoclonal antibody (cat. no. A3530), a rabbit anti-nucleolar and spindle associated protein 1 (NUSAP1) polyclonal antibody (cat. no. A16000), a rabbit anti-TOP2A polyclonal antibody (cat. no. A16440) and a mouse monoclonal antibody for β-actin (cat. no. AC004) (all from ABclonal Biotech Co., Ltd. and all at 1:1,000).
The following day, the membranes were incubated for 1 h at room temperature using the corresponding secondary antibody; Goat Anti-Rabbit IgG H&L (HRP; cat. no. ab205719)and Goat Anti-Mouse IgG H&L (HRP; cat. no. ab205719 all from Abcam and all at 1:10,000. This was followed by a brief incubation with ECL Western Blotting Detection Reagent (cat. no. P0018S; Beyotime Institute of Biotechnology) and a final exposure with an iBright imaging system (Thermo Fisher Scientific, Inc.). Density measurement was by ImageJ (version V1.8.0.112; National Institutes of Health).
For the statistical calculations, the R (version 3.6) and WGCNA packages were used. The calculation of the correlation coefficient between the relevant clinical characteristics of LUAD tissue and the ME of each co-expression module used in this article was based on the R language platform Rstudio (version 8.9.173593;
A co-expression network was constructed by including 5,435 genes with 25% of the maximum variation in the present study. No significant outliers were observed by building hierarchical clustering trees for 5,435 genes from 14 lung tissue samples. A total number of 580 DEGs were identified in the dataset (
According to the non-scale network distribution fitting, a value of 20 was selected as the soft threshold (β value) for this dataset and a co-expression network was constructed (
By applying the correlation analysis of each module using sample clinical information, the green module presented with the highest positive correlation, and the blue module the highest degree of negative correlation with LUAD (
According to the criteria of GS >0.8 and MM >0.8 to screen the key genes in the blue module and the green module for the following research stage, 845 and 285 key genes were selected from the blue and green modules, respectively. Subsequently, GO function enrichment analysis and KEGG enrichment analysis were performed on the 845 genes selected from blue module and the 285 genes selected from green module (
The 845 genes from the blue module and 580 differentially expressed genes were intersected, in order to obtain 324 genes. Similarly, the 285 genes from the green module and 580 differential genes were intersected to obtain 107 genes. The two PPI networks for the aforementioned 324 and 107 genes were then respectively established using Cytoscape software (
Subsequently, the expression profiles of 59 normal lung tissues and 515 LUAD tissues were acquired from TCGA database to verify the expression of the aforementioned 20 key genes. With the exception of the expression of
Subsequently, ROC curve analysis was performed on the 19 genes verified in TCGA database, and it was observed that apart from
Subsequently, survival analysis using the 13 genes was performed by GEPIA and it was determined that the P-value of eight genes was <0.05, including
To validate the results of bioinformatics analysis, the expression levels of the aforementioned eight genes were verified in human LUAD tissues and paired lung paracancerous tissues using RT-qPCR and western blot analysis. The relative mRNA expression levels of seven out of eight genes, namely
Lung cancer is one of the most prevalent types of cancer and currently presents with the highest mortality rate. Among patients recently diagnosed with lung cancer, the 5-year survival rate following diagnosis has been observed to be extremely reduced in the majority of countries, with a survival rate of only 1/10 to 1/5 (
Several inhibitors with high specificity for
Chemotherapy resistance research has emerged as a major challenge in cancer treatment. Currently, resistance to radiation therapy in LUAD has been attributed to elevated levels of autophagy and thus resistance, and
It has been reported that
It has been reported that
Not applicable.
The datasets used and/or analyzed during the present study are available from the corresponding author on reasonable request.
XZ and YX contributed significantly to the concept and design of the study. XZ and SW conducted bioinformatics experiments and obtained data. HL, RH and MZ conducted confirmatory experiments and obtained data. XZ and JR and LC analyzed the data. XZ, YX and MZ drafted the manuscript. YX, SW and RH critically modify the important intellectual content of the study. XZ and YX confirm the authenticity of all the raw data. BX and NZ and WS contributed to the collection and collation of clinical samples from lung adenocarcinoma patients. All authors have read and approved the final manuscript and have agreed to take responsibility for all aspects of the work.
The present study was approved (approval no. WDRY2022-K231) by Renmin Hospital of Wuhan University (Wuhan, China) and written informed consent was obtained from patients in all cases.
Not applicable.
The authors declare that they have no competing interests.
Normalization of gene expression and gene differential expression of data between two groups of samples. (A) Standardization of data. The blue bars represent the data before normalization, and the red bars represent the data following normalization. (B) Principal component analysis of two groups of sample data. (C) Differential expression of data between the two groups of samples. Red dots indicate upregulated genes and green dots indicate downregulated genes (|fold change|>2.0, adj-P<0.05). Gray dots indicate genes with no significant difference in expression.
WGCNA analysis of the data. (A) Determination of the optimal soft thresholding power. (B) Construction of co-expression matrix and module visualization. WGCNA weighted gene co-expression network analysis. WGCNA, weighted gene co-expression network analysis.
Module correlation analysis. (A) Correlation analysis between modules. The poor correlation between modules indicates that the module division is successful. (B) Correlation analysis between modules and diseases. The green module negatively correlated with disease (R2=0.91; P=5×10−6), while the blue module positively correlated with disease (R2 =0.95; P=2×10−7).
Correlation analysis between genes and traits for the green and blue modules. (A) Module membership vs. gene significance map of genes for green module. Hub genes with GS >0.8 and MM >0.8 were selected. (B) Module membership vs. gene significance map of genes for the blue module. Hub genes with GS >0.8 and MM >0.8 were selected. (C) The top 10 biological processes of hub gene enrichment analysis for the green module and blue modules. (D) Hub gene KEGG enrichment analysis of the top 10 pathways for the green and blue modules. GS, gene significance; MM, module membership; KEGG, Kyoto Encyclopedia of Genes and Genomes.
PPI analysis of all Hub genes in the green and blue modules. (A) PPI analysis of Hub gene for the green module. Red, yellow and green are the top three protein-protein interaction subnetworks respectively. (B) PPI analysis of hub genes for the blue module. Red, yellow and green are the top three PPI subnetworks respectively. PPI, protein-protein interaction.
In total, 59 control samples and 504 cancer samples were derived from TCGA database for verification. From the 10 genes selected from the two modules, a total of 20 genes were verified. A total of 19 of these changes were verified, and the other one
ROC curve analysis of 20 genes. AUCs >0.9 were included in the subsequent analysis. A total of 13 genes (
Survival analysis was performed and survival curves were obtained using GEPIA database. According to the P-values, there were eight genes with P<0.05 (
mRNA expression of eight genes screened using WGCNA in clinical tissue samples. Clinical samples are divided into lung adenocarcinoma adjacent tissues and lung adenocarcinoma tissue samples. (A-H) In each graph, the left bar represents the lung adenocarcinoma adjacent tissues, and the right bar the lung adenocarcinoma tissues. The mRNA expression levels of
Protein expression of seven genes screened using WGCNA in clinical tissue samples. (A) Representative western blots of AURKA, BUB1, CCNB1, CDK1, MELK, NUSAP1 and TOP2A protein expression in adjacent lung adenocarcinoma tissues and lung adenocarcinoma tissue samples. (B) The quantitative analysis of the data in panel A, in which the protein levels of AURKA, TOP2A and MELK in lung adenocarcinoma tissues were higher than those in matched adjacent lung adenocarcinoma tissues. WGCNA, weighted gene co-expression network analysis.
Oligonucleotide primers used in the present study.
Gene | Oligonucleotide primer sequence (5′-3′) | |
---|---|---|
Sense | GGAAGCTTGTCATCAATGGAAATC | |
Antisense | TGATGACCCTTTTGGCTCCC | |
Sense | AAGGGTAGACACAAAACTACAGGTC | |
Antisense | ATGTACTGACCAGGAGGGATAGA | |
Sense | CCTTCTATGGTGGATGGTTTGA | |
Antisense | ATGGGCTGCAAGAGGTTTAGAT | |
Sense | GATGTTCCCAAGTGGCTCTCTC | |
Antisense | TCCTCCATTGTTTGCCTGTTG | |
Sense | CTGCTGCTGTTATTACCCCATTC | |
Antisense | CTTTCTTCTCCTTTCGTTCTTGC | |
Sense | GAAGAAATACCACAATGACCCAAG | |
Antisense | TGGGTTTCAGTGAGGCGTGT | |
Sense | TGCCCTGTCTTACTGTCATTCG | |
Antisense | AAAGGAGGCTTCCCAACTAAAA | |
Sense | GCCTATTTTGGTTGATACTGCCTC | |
Antisense | CTCCATCTTCTGCATCCACATC | |
Sense | TGACTGCTCCTGCCTTCATAAC | |
Antisense | TAACACCATTCTCCTCCACAGC |
CDK1, cyclin dependent kinase 1; TOP2A, DNA topoisomerase II alpha; MELK, maternal embryonic leucine zipper kinase; NUSAP1, nucleolar and spindle associated protein 1; BUB1, BUB1 mitotic checkpoint serine/threonine kinase B; AURKA, aurora kinase A; CCNB1, cyclin B1; PBK, PDZ binding kinase.