Aberrant expression of long noncoding RNAs in chronic thromboembolic pulmonary hypertension

Chronic thromboembolic pulmonary hypertension (CTEPH) is one of the primary causes of severe pulmonary hypertension. In order to identify long noncoding RNAs (lncRNAs) that may be involved in the development of CTEPH, comprehensive lncRNA and messenger RNA (mRNA) profiling of endothelial tissues from the pulmonary arteries of CTEPH patients was conducted with microarray analysis. Differential expression of 185 lncRNAs was observed in the CTEPH tissues compared with healthy control tissues. Further analysis identified 464 regulated enhancer-like lncRNAs and overlapping, antisense or nearby mRNA pairs. Coexpression networks were subsequently constructed and investigated. The expression levels of the lncRNAs, NR_036693, NR_027783, NR_033766 and NR_001284, were significantly altered. Gene ontology and pathway analysis demonstrated the potential role of lncRNAs in the regulation of central process, including inflammatory response, response to endogenous stimulus and antigen processing and presentation. The use of bioinformatics may help to uncover and analyze large quantities of data identified by microarray analyses, through rigorous experimental planning, statistical analysis and the collection of more comprehensive data regarding CTEPH. The results of the present study provided evidence which may be helpful in future studies on the diagnosis and management of CTEPH.


Introduction
Pulmonary endarterectomy (PEA) can prevent mortality due to right ventricle failure (1), which occurs in chronic thromboembolic pulmonary hypertension (CTEPH), a type-4 pulmonary hypertension (2). A number of gene expression studies conducted in postmortem lung tissue samples from patients with CTEPH have indicated that aberrant processing of the messenger RNA (mRNA) transcriptome in CTEPH may provide a mechanistic convergence between the diverse genetic heritability of this disease and the disruption of fundamental signaling pathways, resulting in the common CTEPH phenotype. Studies conducted in pulmonary endothelial cells have identified differentially expressed genes, which may be involved in the pathogenesis of CTEPH (3,4). However, gene expression regulation is a complex process, which involves an interplay between DNA sequence variation, chromatin and epigenetic modifications, protein transcription factors and regulatory noncoding RNAs.
A previous study by our group examining the role of the transcriptome in pulmonary artery endomembrane samples demonstrated that the abnormal expression of mRNA transcripts may represent a point of convergence in the otherwise heterogeneous genomics, underlying the development of CTEPH (5). However, the regulatory RNAs inducing the aberrant mRNA expression levels observed in CTEPH have not been concurrently assessed. To the best of our knowledge, only a single study conducted in the tissues of CTEPH patients has demonstrated that miRNA-759 may influence the susceptibility to the development of CTEPH (6). Long noncoding RNAs (lncRNAs), a novel class of regulatory RNAs, have been shown to be involved in certain fundamental events in gene regulation. However, the role of these molecules in the pathogenesis of CTEPH remains unclear (7).
lncRNAs are noncoding RNA molecules that are longer than 200 nucleotides. lncRNAs were originally considered to be 'transcriptional noise'; however, their involvement in important mechanisms controlling the gene expression regulation has been demonstrated. These mechanisms include targeting transcription factors, initiating chromatin remodeling, directing methylation complexes and blocking proximate transcription (8). Aberrant regulation of lncRNAs has been shown to be associated with a number of diseases, including certain forms of cancer (9,10). Numerous lncRNAs have been identified through large-scale analyses of full-length cDNA sequences in humans, mice and flies. lncRNA molecules have been shown to play an important role in the control of imprinting, cell differentiation, immune response, pathogenesis of various human diseases, tumorigenesis and other biological processes (11)(12)(13)(14)(15). However, the expression and biological function of lncRNAs in CTEPH remains to be elucidated.
The aim of the present study was to determine whether the dysregulation of the lncRNA expression is involved in the molecular pathogenesis of CTEPH. The lncRNAs expression profiles of five CTEPH patients were compared with healthy control individuals (normal tissues). In addition, an assessment of the transcriptional differences in all known protein-coding mRNAs was conducted.

Materials and methods
Patient samples. In total, five patients diagnosed with CTEPH (male, 2; female, 3; mean age, 38.2 years; age range, 17-52 years) who had been referred to Beijing Chaoyang Hospital (Capital Medical University, Beijing, China) were recruited to the study. The study was approved by the relevant ethics committee of Beijing Chaoyang University. All the patients provided informed written consent prior to participation in the study. Pulmonary angiography and right heart catheterization were used in the diagnosis of CTEPH and determination of cardiopulmonary hemodynamics (16). Mean pulmonary artery pressure >25 mmHg at rest or >30 mmHg during exercise was considered to indicate the presence of pulmonary hypertension. Pulmonary vascular resistance (PVR) and the six-minute walk test (6-MWT) were hemodynamic variables applied to assess the cardiopulmonary function and prognosis of the CTEPH patients. At inclusion, all the patients received oral anticoagulants for a minimum of 6 months and underwent PEA in accordance with the guidelines of the Beijing Chao-Yang Hospital (Beijing, China). In addition, healthy control samples were obtained from five lung transplant donors. The control subjects and patients were matched according to age and gender. Written informed consent was obtained from the healthy controls or their families.

RNA extraction.
To prepare the samples for microarray profiling, total RNA was isolated from the CTEPH patient and normal tissue samples using TRIzol™ reagent (Invitrogen Life Technologies, Inc., Burlington, ON, Canada) and purified using an RNeasy Mini kit (Qiagen, Hilden, German), including a DNase digestion treatment. RNA concentrations were determined by measuring the sample absorbance at 260 nm with a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Inc., Waltham, MA, USA). A260/A280 ratio values of 1.8-2.1 were set as the quality control standard.
Microarray profiling. cDNA was generated via reverse transcription of RNA obtained from pulmonary artery endothelium samples using a reverse transcription kit (miScript II RT kit; Qiagen). cDNA obtained from pulmonary artery endothelium samples of the CTEPH patients or normal controls was hybridized to GeneChip ® Human Gene 2.0 ST arrays (Affymetrix, Inc., Santa Clara, CA, USA), according to the manufacturer's instructions. Affymetrix Expression Console™ software (version 1.2.1; Affymetrix, Inc.) was used for microarray analysis. The raw data (CEL files) were normalized at the transcript level using the robust multiarray average method (17), followed by median summarization of the transcript expression levels. Subsequently, gene-level data were filtered to include only the probe sets derived from the 'core' metaprobe list, representing the reference sequence (RefSeq) genes.
Significant differential gene analysis. The random variance model (RVM) t-test was used to filter differentially expressed genes in the control and CTEPH groups. This test was selected since it can effectively raise the degrees of freedom when investigating small samples. Following significance analysis of microarrays and false discovery rate (FDR) analysis, the differentially expressed genes were identified according to the predetermined P-value threshold using BRB-ArrayTools (version 4.3.0 Beta 1; National Cancer Institute, Bethesda, MD, USA). P<0.05 was considered to indicate a statistically significant difference (18)(19)(20). The differentially expressed genes were subjected to unsupervised hierarchical clustering (Cluster 3.0; Stanford University, Stanford, CA, USA) and TreeView version 3.0 analysis (Stanford University).
Coexpression network. Gene coexpression networks were constructed based on the normalized signal intensity of the specific expression of genes, in order to identify the interactions among genes (21). For each pair of genes, the Pearson correlation coefficient was calculated in order to identify pairs with a significant correlation, thus enabling construction of the network (22).
When conducting a network analysis, the simplest and most important measure of gene centrality within a network is degree centrality. Degree centrality is defined as the number of links a particular node has to other nodes in the network (23). Furthermore, k-cores were introduced using graph theory in order to simplify the graph topology analysis and investigate various network properties. A k-core of a network consists of a subnetwork where all the nodes are connected to at least k other genes. A k-core of a protein-protein interaction network usually contains cohesive groups of proteins with similar functions (23,24).
Network structure analysis aims to locate core regulatory factors (genes). Within a network, core regulatory factors connect the majority of nearby genes and have the highest degree centralities. When evaluating different networks, core regulatory factors are determined by the degree differences between the CTEPH and normal tissue samples (25), since they show the highest degree differences. The network was constructed using Cytoscape software version 2.8.3 (Cytoscape Consortium, San Diego, CA, USA).
Gene ontology (GO) analysis. Based on the gene ontology database (http://www.geneontology.org/; accessed on January 15, 2014), the significance level of GO terms for the CTEPH-associated differentially expressed genes was analyzed by two-side Fisher's exact test and χ 2 test using the Database for Annotation, Visualization and Integrated Discovery (DAVID) software version 6.7 (http://david.abcc.ncifcrf.gov/home.jsp; accessed on January 11, 2014; National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA) (26). Differentially expressed genes were analyzed independently according to whether they were upregulated or downregulated. P-values were calculated for the differentially expressed genes in all GO categories. P<0.01 and FDR<0.01 were considered to indicate statistically significant results.
Pathway analysis. Based on the Kyoto Encyclopedia of Genes and Genomes database (http://www.genome.jp/kegg/; accessed on January 12, 2014; Kanehisa Laboratories, Kyoto, Japan), the significance levels of CTEPH-associated differentially expressed gene pathways were analyzed using Pathway-Express version 1.0 (Intelligent Systems and Bioinformatics Laboratory, Detroit, MI, USA) (27,28). The occurrence of significant differences from the expected values was assessed using a two-sided binomial distribution. The number of differentially expressed genes corresponding to each pathway category was counted and compared with the number of genes expected for each pathway category. All the signaling pathways were analyzed using γ P<0.05, provided by the impact analysis, as the threshold indicating a statistically significant difference.

Results
Overview of lncRNA profiles. Based on the lncRNAs expression profiles, a number of differentially expressed lncRNAs were identified between the CTEPH and healthy control samples. The expression profiles of lncRNAs in the paired samples were determined by calculating the log fold change of CTEPH/control samples. Due to the limited sample size, FDR and P-values were calculated from normalized expression levels. Hundreds of differentially expressed human lncRNAs were identified using the RefSeq (www.ncbi.nlm.nih.gov/refseq), Ensembl (www.ensembl.org), lncRNAdb (www.lncrnadb.org), Broad Institute, Human Body Map lincRNAs (www.broadinstitute. org/genome_bio/human_lincrnas/) and transcripts of uncertain coding potential catalog (http://www.broadinstitute.org/ genome_bio/human_lincrnas/?q=TUCP_transcripts_catalog) databases in the CTEPH patients and healthy controls.
Using the microarray data, the expression levels of lncRNAs in the five CTEPH tissue samples were compared with the matched normal tissue samples. In total, 185 lncRNAs were identified in which the expression levels were found  Table II. Collection of the top ten deregulated lncRNAs detected using microarray analysis in ten CTEPH and control samples. to be significantly different between the two groups. The characteristics of the five CTEPH patients and five normal controls are shown in Table I. The procedure used to obtain a sequence may vary depending on the database used for lncRNA classification. While tens of thousands lncRNAs were investigated in normal and diseased tissues, only a few hundreds lncRNAs were found to be significantly upregulated or downregulated. Thus, the upregulation or downregulation of lncRNAs was used to distinguish the CTEPH patient tissues from healthy tissues (Fig. 1). Compared with normal tissues, NR_033766 was the most evidently downregulated lncRNA, whereas TCONS_l2_00010131-XLOC_l2_005462 was upregulated to the greatest degree (Table II). Therefore, downregulated lncRNAs were shown to be more prevalent compared with upregulated lncRNAs in the CTEPH group.
Overview of mRNA profiles. In total, ≤30,654 coding transcripts were detected in the ten samples that were examined. Using the RVM t-test method, 880 genes were found to be upregulated and 734 genes were found to be downregulated in the CTEPH samples, compared with the healthy controls. The results supported the hypothesis that CTEPH is a metabolic disease as the mRNA expression level of oxidized low-density lipoprotein receptor 1 showed the greatest upregulation, while the mRNA expression level of chordin-like 1 showed the greatest downregulation. Therefore, upregulation of mRNA expression levels was more prevalent compared with downregulation in the CTEPH group.

Analysis of nearby lncRNAs and mRNAs.
Previous studies have used chromatin-state maps to identify 3,019 lncRNAs with a clear evolutionary conservation, which are associated with distinct and diverse biological processes (such as cell proliferation), RNA binding complexes, immune surveillance, embryonic stem cell pluripotency, neuronal processes, morphogenesis, gametogenesis and muscle development (29,30). Among the 185 differentially expressed lncRNAs identified in the present study, 74 lncRNAs were shown to have differentially expressed mRNAs that were overlapping, antisense or nearby. Further analysis resulted in the identification of nine pairs of differentially expressed lncRNAs overlapping with mRNAs, nine pairs of lncRNAs and antisense mRNAs, 340 lncRNAs located upstream of mRNAs (distance, <300 kb) and 106 lncRNAs located downstream of mRNAs (distance, <300 kb) in each comparison between CTEPH and normal control tissues. Among the 464 lncRNA-mRNA pairs, the expression regulation of 442 lncRNAs and nearby coding genes was in the same direction (up or down), whereas the expression of 22 pairs was regulated in opposite directions.
Construction of the coding-noncoding gene coexpression network. The correlation analysis among differentially expressed lncRNAs and mRNAs was used to construct a coding-noncoding (CNC) gene coexpression network. LncRNAs and mRNAs having Pearson correlation coefficients ≥0.97 were selected and the network was constructed using the Cytoscape software. Within this coexpression network, the CTEPH-CNC network node consisted of 129 lncRNAs and 275 mRNAs, whereas the normal control-CNC network node consisted of 134 lncRNAs and 294 mRNAs (Fig. 2). A total of 832 network nodes in the two networks formed 3,239 coexpression pairs of lncRNAs and mRNAs, with positively correlated expression in the majority of pairs. Investigation of the CNC network indicated that one mRNA may correlate with numerous lncRNAs and vice versa. In order to identify the most significant RNA molecules in CTEPH, the degree of certain RNAs (k-core) in each network was normalized and the difference in connectivity (diffK), representing the differences between the two networks, was calculated. NR_036693 and NR_027783 were found to be the most differentially expressed lncRNAs, whilst the expression of the arginine vasopressin receptor 1A gene showed the greatest significant difference in the genes examined. The CNC network presented here, may implicate the inter-regulation of lncRNAs and mRNAs in the development of CTEPH.
Coexpressed coding gene function analysis. An important part of research into lncRNAs involves inferring the possible functions of nearby protein-coding genes (29,31). In the current study, the differentially expressed mRNAs of the two CNC networks were combined. Subsequently, the DAVID functional annotation chart (32,33) and pathway analysis results were used to perform functional enrichment analysis of the differentially regulated protein-coding gene and lncRNA pairs. The annotation terms showing the greatest significant differences (with the lowest P-values) were found to be immune response, inflammatory response, defense response and response to wounding (Table III). In addition, the most important signaling pathways relevant to CTEPH were found to be antigen processing and presentation, cytokine-cytokine receptor interaction and leukocyte transendothelial migration (Fig. 3). Therefore, lncRNAs are hypothesized to modulate the host response through their effect on nearby protein-coding genes.

Discussion
To the best of our knowledge, no previous studies have investigated the lncRNA expression profiles in CTEPH or the association of lncRNA expression with clinical characteristics and outcomes in CTEPH patients. CTEPH is a polygenic disorder, resulting from genetic alterations. The description of thousands of genomic sequences, along with technological  (35)(36)(37)38). In addition, lncRNAs, such as HOTAIR, are involved in the development and progression of tumors, such as breast cancer (39).
In the current study, differentially expressed lncRNAs and nearby coding gene pairs were described. The silencing or reduced expression of particular lncRNAs has been previously demonstrated to result in a concomitant reduction in the expression of nearby protein-coding genes, including numerous proteins which are known to govern the regulation of cellular differentiation. In addition, the lncRNAs and nearby coding genes may share upstream regulation or local transcriptional effects (29,31,(40)(41)(42). Certain lncRNAs have been reported to increase gene expression. For instance, Evf-2 ncRNA forms a complex with the homeodomain-containing protein Dlx2, which leads to transcriptional enhancement (43). In addition, heat-shock RNA-1 ncRNA binding to heat-shock transcription factor 1 has been found to result in the induction of heat-shock proteins (44). Furthermore, an isoform of the ncRNA steroid receptor RNA activator is known to be associated with steroid receptor responsiveness (45). Finally, Ørom et al (46) recently identified that noncoding RNA-activators 1-7 can enhance the expression of nearby genes. Thus, analyzing the genes nearby to lncRNAs may assist in understanding the involvement of lncRNAs in CTEPH. The majority of lncRNAs have a distinct spatial and temporal specificity in the process of organismal differentiation and development (47). A previous study, which investigated 1,300 lncRNAs in mice, demonstrated that in particular areas of the brain, there are different expression patterns of lncRNAs (48). These lncRNA expression signatures have been detected in prostate carcinoma and hepatic tumors (49). Thus, differential expression patterns of lncRNAs may be present in the pulmonary artery tissues of CTEPH patients, and lncRNAs that are differentially expressed may result in alterations in cellular function, which may be associated with the pathogenesis of CTEPH. In the present study, 464 pairs of differentially expressed enhancer-like lncRNAs and mRNAs, of which 95.3% (442/464) were regulated in the same direction (up or down). Therefore, the present study hypothesized that a number of these lncRNAs enhance the activation of nearby genes.
Molecular networks are useful in the investigation of biological processes and can be constructed using results obtained from co-immunoprecipitation experiments (50) or from algorithmic predictions based on gene function correlation and expression profiles (51). Network models based on algorithmic predictions from high throughput gene expression tests may be used to construct images of the networks regulating gene expression and metabolic pathways in the groups analyzed. The networks intrinsic to the CTEPH phenotype are hypothesized to be involved in the normal functioning of the pulmonary artery endothelium. Based on the information obtained regarding the expression of lncRNAs and mRNAs, Pearson correlation coefficients were calculated. Pairs found to have a significant correlation were selected and used to construct a network. These results demonstrated an association between lncRNAs and mRNAs, indicating that lncRNAs may regulate specific mRNAs, or vice versa. mRNAs are likely to be directly involved in the pathogenesis of CTEPH, while lncRNAs function through the epigenetic modification of mRNAs.
Based on previous studies and the results of the computer analysis conducted in the present study, four lncRNAs with the largest diffK were further investigated. lncRNA NR_036693 is a 5,255 bp transcript variant 6 of Homo sapiens C-type lectin domain family 2, member D. This gene encodes a member of the natural killer cell receptor C-type lectin family. The natural killer cell has been found to be involved in vascular remodeling, which may lead to pulmonary arterial hypertension (52,53). NR_027783 is a 1,199 bp transcript variant 2 from Homo sapiens spermidine/spermine N1-acetyltransferase 1 (SAT1). The protein encoded by this gene is a member of the acetyltransferase family and a rate-limiting enzyme in polyamine metabolism. Numerous studies have demonstrated that the polyamine regulatory pathway is a pharmacological target in pulmonary arterial hypertension (54,55). NR_033766 is a 6,384 bp transcript variant 7 from Homo sapiens forkhead box P2 (FOXP2). The FOXP2 gene is involved in the normal development of the areas of the brain controlling speech and language during embryogenesis. In addition, FOXP2 may be associated with a number of biological pathways and cascades, which also influence the development of language. Mutations in this gene result in speech-language disorder 1 (SPCH1), also termed as autosomal dominant speech and language disorder with orofacial dyspraxia. NR_001284 is a 2,783 bp pseudogene, Homo sapiens tenascin XA, the biological function of which remains unclear.
Identification of the putative functions of genes associated with lncRNAs may improve the understanding of the functional role of these molecules (30,32). Peng et al (56) performed a functional enrichment analysis on protein-coding genes nearby to differentially expressed lncRNAs in SARS-CoV infected mice. The authors identified that the most significant functional group consisted of annotation terms associated with gene expression, including transcription regulation, nuclear and DNA-binding transcription factor activity, as well as the regulation of RNA metabolism. In the present study, GO functional enrichment analysis of differentially expressed mRNAs with their coexpressed differentially expressed lncRNA partners, demonstrated that these genes were functionally associated with immune response, inflammatory response, response of wounding and response to endogenous stimulus. Furthermore, pathway analysis revealed that antigen processing and presentation, cytokine-cytokine receptor interaction and leukocyte transendothelial migration may be involved in the development of CTEPH. Although the function of lncRNAs in CTEPH still requires further investigation, the present study hypothesized that the formation of CTEPH may be caused by certain lncRNAs.
To the best of our knowledge, this is the first study describing the expression profiles of human lncRNAs in CTEPH by microarray. The expression levels of a number of lncRNAs were found to be aberrant in tissue samples from CTEPH patients, compared with the healthy control tissues. These deregulated lncRNAs may function as activators or suppressors of genes involved in the development and progression of the disease. Further investigation is required to determine whether these lncRNAs may serve as novel therapeutic targets and diagnostic biomarkers in CTEPH.