Systematic prediction of target genes and pathways in cervical cancer from microRNA expression data

Cervical cancer (CC) is a leading cause of cancer-associated mortality in women; thus, the present study aimed to investigated potential target genes and pathways in patients with CC by utilizing an ensemble method and pathway enrichment analysis. The ensemble method integrated a correlation method [Pearson's correlation coefficient (PCC)], a causal inference method (IDA) and a regression method [least absolute shrinkage and selection operator (Lasso)] using the Borda count election algorithm, forming the PCC, IDA and Lasso (PIL) method. Subsequently, the PIL method was validated to be a feasible approach to predict microRNA (miRNA) targets by comparing predicted miRNA targets against those from a confirmed database. Finally, Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis was conducted for target genes in the 1,000 most frequently predicted miRNA-mRNA interactions to determine target pathways. A total of 10 target genes were obtained that were predicted >5 times, including secreted frizzled-related protein 4, maternally expressed 3 and NIPA like domain containing 4. Additionally, a total of 17 target pathways were identified, of which cytokine-cytokine receptor interaction (P=8.91×10-7) was the most significantly associated with CC of all pathways. In conclusion, the present study predicted target genes and pathways for patients with CC based on miRNA expression data, the PIL method and pathway analysis. The results of the present study may provide an insight into the pathological mechanisms underlying CC, and provide potential biomarkers for the diagnosis and treatment of this tumor type. However, these biomarkers have yet to be validated; these validations will be performed in future studies.


Introduction
Cervical cancer (CC) is a leading cause of cancer-associated mortality in women globally, with 500,000 new cases and 250,000 incidences of mortality annually in 2012 (1).Previous studies have reported that human papillomavirus infection is a high risk factor for CC; however, it is insufficient to initiate of malignancy alone (2), and genetic alterations are essential for the progression from precancerous disorder to invasive cancer (3).Thus, it is necessary to understand the pathogenic progresses that drive CC to further prevent its development by dissecting the components involved in the pathogenic process (4).
Clinically, early-stage and locally advanced CC may be treated with standard radiotherapy and chemotherapy, or the two treatments combined; however, patients with metastatic cancer types and those with persistent or recurrent disease following platinum-based chemoradiotherapy have limited options (5,6).Furthermore, the clinical outcomes vary substantially and are difficult to predict, owing to the lack of effective outcome prediction models, which make it difficult to apply individualized treatment protocols to patients with CC (7).With the development of gene expression-associated analysis methods, target-gene treatments may be applied to largely solve this problem and potentially improve patient survival (8).Therefore, the identification of target genes to aid the prediction of CC prognosis is a necessary task.
MicroRNAs (miRNAs/miRs) are a family of small non-coding RNA molecules (~22 nucleotides in length) that regulate gene expression by promoting mRNA degradation and repressing translation (9).miRNAs modulate the expression of target mRNAs post-transcriptionally by base pairing to complementary sequences in the 3'-and 5'-untranslated regions, and occasionally the open-reading frames of mRNAs (10,11).However, miRNA expression signatures have been revealed to be promising potential biomarkers for the classification or outcome prediction of a wide array of human cancer types (12), including lung cancer (13).miRNAs are involved in numerous cancer-associated processes, including proliferation, metabolism, differentiation, apoptosis, cellular signaling and cancer development and progression (14).Hence, the investigation of miRNA functions allows for the elucidation of the complex pathological mechanisms underlying malignant tumor types,

Systematic prediction of target genes and pathways in cervical cancer from microRNA expression data
and aids the design of drugs for the treatment of malignant tumors.However, to date, the prediction of miRNAs targets in CC has rarely been investigated.Therefore, in the present study, target miRNAs involved in CC were predicted utilizing an ensemble method proposed by Le et al (15).The ensemble method integrated a correlation method [Pearson's correlation coefficient (PCC)], a causal inference method (IDA), and a regression method [least absolute shrinkage and selection operator (Lasso)], which formed the PCC (16), IDA (17,18) and Lasso (19) (PIL) method, based on the Borda count election method.The PIL approach may solve the inconsistencies in results that result from individual methods as it includes complementary results (20).Although there is not a full understanding regarding miRNA target prediction, the ensemble method may aid the identification of a number of confirmed interactions that existing individual methods fail to discover (15).Overall, using the PIL method, more reliable results can be obtained compared with existing individual methods (15).
To validate the activity of the predicted miRNA targets in patients with CC, effective methods must be utilized.Previous studies have proposed the use of a number of different methods (21,22), including a semi-supervised method (21).The semi-supervised method was mainly dependent on a support vector machine model, which used the experimentally confirmed database miRTarBase (23) as the control set and Tarbase (24) as the test set.Owing to the positive classification performances, miRTarBase and Tarbase were utilized in the present study, in addition to the other two commonly used databases, miRecords (25) and miRWalk (26), owing to the sparseness of the number of confirmed interactions, for validation of the miRNA targets.
Following the identification of miRNA targets using the PIL method and validating them by matching them with confirmed databases [miRTarBase (23), TarBase (24), miRecords (25), and miRWalk (26)], Kyoto Encyclopedia of Genes and Genomes (KEGG) (27) pathway enrichment analysis was conducted for target genes in the 1,000 most frequently predicted miRNA-mRNA interactions to determine target pathways associated with miRNA targets in CC.These targets may be potential biomarkers for CC treatment, revealing the pathological mechanism underlying this cancer.

Collecting expression data.
In the present study, miRNA and mRNA expression data from patients with CC were downloaded from The Cancer Genome Atlas (TCGA; http://cancergenome. nih.gov/).TCGA is a comprehensive and coordinated effort to accelerate the current understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing (28).Owing to the differing quantities and identities of miRNAs and mRNAs between samples, only samples with common intersections were included as study objects.A total of 309 samples were obtained.
To control the quality of miRNA and mRNA in these samples, standard pretreatments were performed.In the first step, miRNAs or mRNAs with an expression value of zero were removed.Secondly, the expression values were normalized and converted into log 2 forms, identifying 889 miRNAs and 20,104 mRNAs.Thirdly, PCC, which evaluates the probability of two gene pairs co-expressing (16), was used to calculate the strength of the correlation between miRNAs and mRNAs.Finally, if the absolute value of PCC for an interaction, denoted as δ, met a threshold of δ≥0.70, the correlations were selected as effective expression data.This resulted in the identification of 53 miRNAs and 216 mRNAs for further examination.
Predicting miRNA targets using the PIL method.The PIL method is an ensemble method that integrates three methods (PCC, IDA and Lasso) based on the Borda count election method.This method mainly comprised three steps: Firstly, for each miRNA, each of the individual methods (PCC, IDA and Lasso) were utilized to produce rankings for each miRNA or to determine the predicted targets of the miRNA, and the 1,000 most frequently predicted performers in identifying miRNA targets were selected.The second step was the application of the Borda rank election method to the rankings for each miRNA to produce a single ranking list of elected mRNAs with respect to the miRNA.Finally, the highest-ranked genes from the list were extracted as the final output, as the potential target genes for the given miRNA.
The Borda rank election method is an efficient method to combine orderly appraising results from several separate evaluating methods (29).Its specific process is as follows: With an election consisting of a set (V) of voters, each identified candidate is assigned a preferential order, a strict, complete and transitive order on a set of candidates (C).Subsequently, each of the candidates is given ||C||-n points for each voter which ranked them in nth place (for example, ||C||-1 points for first, ||C||-2 for second, and so forth until the candidate voter ranked last receives no points).Finally, the average point score of the candidate across all voters was computed, and defined as the z-score.The higher the z-score, the more significantly associated with CC the prediction results were.Ranking the predicted miRNA targets according to their z-scores, the 1,000 most frequently predicted ranked target genes for CC were determined.
Validating predicted miRNA targets.Validating computation results is difficult, as the number of experimentally confirmed targets of miRNAs is limited and there is no complete ground-truth for evaluating and comparing different computational methods (30).In the present study, four databases, miRTarBase v4.5 (23), TarBase v6.0 (24), miRecords v2013 (25) and miRWalk v2.0 (26), were combined to validate the prediction of miRNA targets obtained from the PIL method.miRTarbase provides the most up-to-date, comprehensive information regarding experimentally validated miRNA-mRNA target interactions (31).TarBase is the first resource to provide experimentally verified miRNA target interactions by surveying pertinent literature (32).miRecords accumulates experimentally validated miRNA targets and computationally predicted miRNA targets (25).miRWalk is a publicly available comprehensive resource, hosting predicted and experimentally validated miRNA target interaction pairs (26).There were 37,372 miRNA-mRNA interactions with 576 miRNAs, 20,095 miRNA-mRNA interactions with 228 miRNAs, 21,590 miRNA-mRNA interactions with 195 miRNAs, and 1,710 miRNA-mRNA interactions with 226 miRNAs in the miRTarBase, TarBase, miRecords and miRWalk databases, respectively.Following the removal of the duplicated miRNA-mRNA interactions, a total of 62,858 interactions were retained for validation, termed background interactions.If a miRNA target interaction was involved in background interactions, the predicted miRNA target was validated.
Pathway enrichment analysis.For the purpose of investigating functional biological processes associated with target genes enriched in the 1,000 most frequently predicted miRNA-mRNA interactions for CC, the Database for Annotation, Visualization, and Integrated Discovery (DAVID) for KEGG pathway enrichment analysis were performed (27).KEGG pathways with P<0.05 were selected based on an Expression Analysis Systematic Explored (EASE) test applied in DAVID.EASE analysis of the regulated genes indicated the molecular functions and biological processes unique to each category (33).In functional and pathway enrichment analysis, the threshold of the minimum number of genes corresponded to ≥2, which was considered to be significant for a category: Where n = a' + b + c + d was the number of background genes; a' was the gene number of one gene set in the gene lists; a' + b was the number of genes in the gene list including at least one gene set; a' + c was the gene number of one gene list in the background genes; and a' was replaced with a = a '-1 in EASE.

Results
Predicted miRNA targets.In the present study, based on the CC expression data from TCGA database, a total of 53 miRNAs and 216 mRNAs were obtained for further analysis subsequent to pretreatments.By merging three methods on the basis of the Borda count election algorithm, the PIL method was formed and was utilized to predict miRNA targets.The miRNA targets were identified based on miRNA-mRNA interactions, and the mRNAs or target genes were additionally identified.During this process, each miRNA-mRNA interaction was assigned a z-score, and all interactions were ranked in a descending order of z-scores.The higher the z-score, the more significantly associated with CC the prediction results.Owing to the large number of miRNA targets, the 1,000 most frequently predicted ranked interactions were selected, as they may be more strongly associated with CC than other interactions.
Table I displays the 50 most frequently predicted mRNA-miRNA interactions, and Fig. 1 represents the network of these interactions.It was revealed that secreted frizzled-related protein 4 (SFRP4; z-score=219.0)was the most frequently predicted mRNA, and its corresponding miRNA was miR-199a-1.The next four most frequent interactions were chromosome 9 open reading frame 3 (z-score=218.5)with miR-24-1, iodothyronine deiodinase 3 (z-score=218.2) with miR-1247, maternally expressed 3 (MEG3; z-score=217.8)with miR-431 and kelch-like family member 3 (z-score=216.9)with miR-874.Notably, among the 50 most frequently predicted interactions, MEG3 was regulated by three miRNAs (miR-431, miR-125b-1 and miR-127) simultaneously.miR-199a-1 may regulate two mRNAs (SFRP4 and insulin like growth factor 2) at the same time.Hence, a miRNA may regulate a plurality of genes, and multiple miRNAs may alter the expression of a single gene.
For the specific 1,000 miRNA targets, genes which were regulated by a greater number of miRNAs or which were predicted more frequently, may have a stronger association with CC compared with those only predicted once.This may offer another way to evaluate the importance of one gene in certain tumor.Therefore, the predicted frequency for mRNAs were computed, and the targets which were predicted >5 times amongst the 1,000 miRNA-mRNA interactions were listed (Table II).SFRP4 and MEG3 were predicted 8 times.A total of 7 miRNAs co-regulated NIPA like domain containing 4, whilst guanylate binding protein family member 6, interferon gamma inducible protein 6, family with sequence similarity 83 member C, serpin family B member 5, adhesion G protein-coupled receptor F4, calponin 1 and leiomodin 1 were all regulated by 6 miRNAs simultaneously.
Validation of miRNA targets.To validate the prediction of miRNA targets identified by the PIL method, the miRTarBase, TarBase, miRecords and miRWalk databases were used.By removing duplicated interactions, 62,858 interactions were obtained, which were denoted as background interactions.Selecting intersections between background interactions and all predicted miRNA-mRNA interactions, 105 intersected interactions were detected.The results indicated the feasibility and stability of the PIL method.

Discussion
A number of differing computational methods have been proposed to identify miRNA targets from expression data, including PCC (16), IDA (17,18) and Lasso (19).PCC is commonly used to measure the strength of associations between a pair of variables (16).PCC is used to rank data in descending order of absolute PCC values, and may result in negative miRNA-mRNA correlations being highly ranked, as miRNAs mainly downregulate mRNAs (15).Additionally, the practicability of PCC would be substantially reduced if the correlations were non-linear (34).IDA, a causal inference method, evaluates the causal effect between two variables (17,18).Le et al found that miRNA-mRNA causal regulatory associations revealed by IDA overlapped substantially with the results of follow-up gene-knockdown experiments (35).Lasso, a regression method, minimizes the usual sum of squared errors, with a limit on the sum of the absolute values of the coefficients (19).Similar to the PCC method, the miRNA-mRNA pairs with limitations are ranked highly to favor downregulation.Therefore, the Borda count election method was used to integrate the aforementioned three methods together, giving the PIL method, and validated by identifying intersections between predicted miRNA-mRNA interactions and background interactions.
Using the PIL method, predicted miRNA targets for patients with CC were ranked according to their z-scores and the 1,000 highest ranked interactions were obtained.For specific interactions, target genes that were predicted multiple times were identified.Notably, SFRP4 was the most frequently predicted gene during the prediction process.SFRP4 is a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins (36); it serves notable functions in tumor progress through antagonizing Wnt signaling (37).Hypermethylation of the SFRP4 promoter was associated with CC and may have utility for the molecular screening of cervical neoplasia (38).Brebi et al (39) revealed that SFRP4 may be used as a potential biomarker for CC diagnosis.Therefore, SFRP4 was identified to be closely associated with CC.
For the purpose of investigating functional gene sets involved with miRNA-mRNA targets, pathway enrichment analysis was conducted.A total of 17 target pathways were identified for genes in the 1,000 miRNA-mRNA interactions most notably associated with CC based on KEGG pathway enrichment analysis, of which cytokine-cytokine receptor interactions (P=8.91x10 - ) was the most significantly associated with CC.Cytokines are soluble extracellular proteins, usually secreted in response to an activating stimulus, which induce responses through binding to specific receptors on the surface of target cells (40).Cytokines can be grouped by structure into different families, as can and their receptors.It had been reported that the cytokine-cytokine receptor interaction gene set may induce cancer (41) and was upregulated in cancer cachexia (42); these changes should result in the development of markers for early diagnosis and a better understanding of the conditions of a tumor.Mak et al (43) revealed that the marked upregulation of genes involved in cytokine-cytokine receptor interactions were consistently detected in tumor cell lines.Signaling pathway impact analysis implicated that this pathway was commonly altered in triple-negative breast cancer (44).Therefore, it can be inferred that cytokine-cytokine receptor interactions additionally serve notable functions in the progression of CC, and, to the best of our knowledge, the present study is the first to reveal the correlation between cytokine-cytokine receptor interactions and CC.
In conclusion, the present study predicted target genes and pathways for patients with CC based on miRNA expression data, the PIL method and pathway analysis.The results of the present study may provide insights into the pathological mechanism underlying CC, and provide potential biomarkers for the diagnosis and treatment of this tumor type.However, these biomarkers are yet to be validated, and the relevant validations should be performed in future studies.

Figure 1 .
Figure 1.Network of the 50 most frequently predicted miRNA-mRNA interactions.Nodes were miRNAs (red) and mRNAs (blue), and edges were the interactions between any two of them.miRNA/miR, microRNA.

Table I .
All 50 most frequently predicted mRNA-miRNA interactions.