Comparison of gene expression profiles and related pathways in chronic thromboembolic pulmonary hypertension

Chronic thromboembolic pulmonary hypertension (CTEPH) is one of the main causes of severe pulmonary hypertension. However, despite treatment (pulmonary endarterectomy), in approximately 15–20% of patients, pulmonary vascular resistance and pulmonary arterial pressure continue to increase. To date, little is known about the changes that occur in gene expression in CTEPH. The identification of genes associated with CTEPH may provide insight into the pathogenesis of CTEPH and may aid in diagnosis and treatment. In this study, we analyzed the gene expresion profiles of pulmonary artery endothelial cells from 5 patients with CTEPH and 5 healthy controls using oligonucleotide microarrays. Bioinformatics analyses using the Gene Ontology (GO) and KEGG databases were carried out to identify the genes and pathways specifically associated with CTEPH. Signal transduction networks were established to identify the core genes regulating the progression of CTEPH. A number of genes were found to be differentially expressed in the pulmonary artery endothelial cells from patients with CTEPH. In total, 412 GO terms and 113 pathways were found to be associated with our list of genes. All differential gene interactions in the Signal-Net network were analyzed. JAK3, GNA15, MAPK13, ARRB2 and F2R were the most significantly altered. Bioinformatics analysis may help gather and analyze large amounts of data in microarrays by means of rigorous experimental planning, scientific statistical analysis and the collection of complete data. In this study, a novel differential gene expression pattern was constructed. However, further studies are required to identify novel targets for the diagnosis and treatment of CTEPH.


Introduction
Chronic thromboembolic pulmonary hypertension (CTEPH) is one of the main causes of severe pulmonary hypertension. CTEPH is characterized by the presence of unresolved throm-boemboli associated with fibrous stenosis in the proximal pulmonary arteries. Diagnosis is usually made in the advanced stages of the disease when pulmonary vascular resistance (PVR) has increased by 5-to 10-fold. This increase in PVR resistance results in pulmonary hypertension and progressive right heart failure. It may be caused by a single or recurrent pulmonary embolism and/or the local formation of thrombi. The proximal location of pulmonary artery obliteration is the main feature observed in patients with CTEPH that differs from pulmonary arterial hypertension (PAH) (1).
Depending on the localization and extent of proximal thrombotic material, a pulmonary endarterectomy (PEA) may be necessary (2). Approximately half of the pulmonary blood flow dynamics can return to normal levels following PEA. However, in approximately 15-20% of patients, PVR and pulmonary arterial pressure (PAP) continue to rise, thus increasing the mortality rate by up to 4-5% (3). It has been suggested that the reason for the development of the persistent occlusion of the pulmonary artery is a misguided thrombus resolution triggered by infection (4), inflammation (5), autoimmunity, malignancy (6) and/or endothelial dysfunction due to a high presence of phospholipid antibodies and lupus anticoagulants (7,8) rather than prothrombotic factors.
A number of factors, such as C-reactive protein (CRP) (9), endothelin-1 and von Willebrand factor (10) may participate in the pathophysiology of pulmonary hypertension. However, there are hundreds of implicated genomic loci with heterogeneous functions. As a result, there is difficulty in understanding the mechanisms by which this diverse genetic susceptibility translates to a common clinical phenotype. The advent of genome-wide technologies, such as gene expression microarray, has made it possible to achieve a comprehensive view of the alterations in gene expression occurring in CTEPH. In the present study, we used bioinformatics to analyze the differences in gene expression between CTEPH and normal tissue. The different Gene Ontology (GO) terms and pathways revealed the most important mechanisms and candidate genes involved in the development of CTEPH; our data may aid in the development of more individualized treatment regimens according to the genetic characteristics of individual patients. patients provided written consent to participate in this study. All patients were examined using lung ventilation and perfusion scans, right-heart catheterization and pulmonary angiography to confirm the diagnosis. Patients with CTEPH were defined as those having a mean pulmonary arterial pressure (mPAP) of ≥25 mmHg with normal wedge pressure (≤12 mmHg) who had dyspnea on exertion during a period of >6 months on effective anticoagulation. In addition, lung perfusion scans were performed to demonstrate a segmental or larger defect concomitant with a normal ventilation scan (11,12). Finally, chronic thromboembolic findings were confirmed on pulmonary angiography (13). Five healthy controls (donors for lung transplants) were also included.
Microarray analysis. For Affymetrix microarray profiling, total RNA of CTEPH and normal tissues was isolated using TRIzol reagent (Invitrogen, Burlington, ON, Canada) and purified using the RNeasy Mini kit (Qiagen, Hilden, Germany), including a DNase digestion treatment. RNA concentrations were determined by absorbance (A) at 260 nm and quality control standards were A260/A280 = 1.8-2.1, using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA).
cDNA of pulmonary artery endothelial cells from patients with CTEPH or the normal controls was hybridized to Human Gene 2.0 ST GeneChip ® arrays (Affymetrix, Inc., Santa Clara, CA, USA) according to the manufacturer's instructions. Affymetrix ® Expression Console Software (version 1.2.1) was used for microarray analysis. Raw data (CEL files) were normalized at the transcript level using the robust multiaverage method (RMA workflow). The median summarization of transcript expressions was calculated. Gene-level data was then filtered to include only those probe sets that are in the 'core' metaprobe list, which represent RefSeq genes.
Analysis of differential gene expression. The RVM t-test was applied to filter the differentially expressed genes for the control and experimental group, as the RVM t-test can raise degrees of freedom effectively in the cases of small samples. After the analysis of differentially expressed genes and false discovery rate (FDR) analysis, we selected the differentially expressed genes according to the P-value threshold. A value of P<0.05 was considered to indicate a statistically significant difference, as previously described (14)(15)(16). The data of differentially expressed genes were subjected to unsupervised hierarchical clustering (Cluster 3.0) and TreeView analysis (Stanford University, Stanford, CA, USA). GO analysis. GO analysis was applied to determine the main functions of the differential expression genes according to the GO database, which is the key functional classification of NCBI, which organizes genes into hierarchical categories and identifies the gene regulatory network on the basis of biological process and molecular function (17,18).
Specifically, a two-sided Fisher's exact test and χ 2 test were used to classify the GO categories, and the FDR (19) was calculated to correct the P-value; the smaller the FDR, the smaller the error in judging the P-value. The FDR was defined as: where N k refers to the number of Fisher's test P-values less than χ 2 test P-values. We computed P-values for the GO terms of all the differentially expressed genes. Enrichment analysis provides a measure of the significance of the function: as the enrichment increases, the corresponding function is more specific, which helps us to find those GO terms with a more concrete functional description in the experiment. Within the significance category, the enrichment Re was given by Re = (n f /n)/(N f /N), where n f is the number of flagged genes within the particular category, n is the total number of genes within the same category, N f is the number of flagged genes in the entire microarray, and N is the total number of genes in the microarray, as previously described (20).
GO-map. GO-map analysis is the interaction network of the signi ficant GO terms of the differentially expressed genes, and was carried out to integrate the associations between these GO terms by outlining the interactions of related GO terms and summarizing the functional interactions of the differentially expressed genes in diseases (18,20).
Pathway analysis. Pathway analysis was used to identify the common pathways associated with the differentially expressed genes according to the KEGG, BioCarta and Reatome databases. We used the Fisher's exact test and χ 2 test to identify the significant pathways, and the threshold of significance was defined by a P-value and FDR. The enrichment Re was calculated with the equation described above, as previously described (21)(22)(23).
Path-Net. Path-Net is the interaction network of the most common pathways associated with the differentially expressed genes, and was built according to the interaction among pathways of the KEGG database to determine the interactions among the significant pathways directly and systemically. It identified the common pathways associated with the differentially expressed genes, as well as the mechanisms behind the activation of a certain pathway (22).
Signal-Net analysis. Using java that allows users to build and analyze molecular networks, network maps were constructed. For instance, if there is confirmative evidence that two genes interact with each other, an interaction edge is assigned between the two genes. The considered evidence is the source of the interaction database from KEGG. Networks are stored and presented as graphs, where nodes are mainly genes (protein, compound, etc.) and edges represent relation types between the nodes, e.g., activation or phosphorylation. The graph nature of Networks peaked our interest to investigate them with powerful tools implemented in R. In order to investigate the global network, we computationally identified the most important nodes. To this end, we determined the connectivity (also known as the degree) defined as the sum of connection strengths with the other network genes: In gene networks, the connectivity measures how well a gene correlates with all other network genes. For a gene in the network, the number of source genes of a gene is called the          indegree of the gene and the number of target genes of a gene is its outdegree. The character of genes is described by betweenness centrality measures reflecting the importance of a node in a graph relative to other nodes. For a graph G:(V,E) with n vertices, the relative betweenness centrality C' B (v) is defined by: where σ st is the number of shortest paths from s to t, and σ st (v) is the number of shortest paths from s to t that pass through a vertex v (24-28).
Data analysis. Numerical data are presented as the means ± standard deviation (SD). Differences between means were analyzed using the Student's t-test. All statistical analyses were performed using SPSS 13.0 software (SPSS, Inc., Chicago, IL, USA).

Results
Clinical characteristics of the two sample groups. The characteristics of the 5 consecutive patients with CTEPH (3 male, 2 female) enrolled in the study are presented in Table I. In addition, tissue from healthy volunteers was obtained from donors of lung transplants and matched to the patients with CTEPH. All patients with CTEPH did not have lung cancer and underwent anti-vitamin K treatment using warfarin. The median D-dimer level of the patients with CTEPH was 0.499 µg/ml (range, 0-1.32 µg/ml).
CTEPH-related differential gene expression profiles. Genomewide transcriptional profiling of tumors has demonstrated that extensive gene expression occurs after the formation of CTEPH. To investigate the possible changes in gene expression, microarray analysis was used to analyze the gene expression profiles in the patients with CTEPH and normal tissue groups and 1,614 genes with statistically significant changes in expression were identified. Of these, 880 genes were upregulated in the CTEPH samples and 734 were downregulated. Ten genes that were the most significantly upregulated or downregulated according to their P-values are listed (Tables II and III). Hierarchical clustering revealed systematic variations in the expression of genes between the two groups (Fig. 1). The results demonstrated that these differential probes could clearly separate the two groups from the whole samples.
GO analysis of differentially expressed genes in CTEPH and GO map. Significant progress in data mining has provided a wide range of bioinformatics analysis options. For example, GO, which has been proven to be extremely useful for the mining of functional and biological significance from very large datasets (17,18), can produce a controlled vocabulary used for dynamic maintenance and interoperability between genome databases. GO analysis of the differentially expressed genes in the two groups was performed. A total of 235 GO items associated with upregulated genes and 177 GO items associated with downregulated genes in the two groups were obtained (Table IV). For the GO items listed in the table, a GO map was constructed to further define the results of GO analysis (Fig. 2). In the GO map, items regarding defense response were the most common between the two groups, suggesting that the formation of thrombi was mainly caused by tissue response to various stimuli, such as inflammation and immune response. Furthermore, items regarding cell proliferation, signal transduction and cytokine production were also very common (high enrichment score). All these items indicated that apart from the traditional knowledge of the role of thrombosis in the development of CTEPH, other mechanisms, such as signaling pathway and cytokines may also play an important role in its pathogenesis.

Pathway analysis of differentially expressed genes in CTEPH and Path-Net.
To determine the involvment of signal transduction pathways in CTEPH, related pathways were analyzed according to the functions and interactions of the differentially expressed genes. By using pathway analysis with the threshold of significance defined on the basis of P<0.05, a great number of significant pathways was found (Tables V and VI). The high enriched pathways targeted by overexpressed genes were involved in cytokine-cytokine receptor interaction, leishmaniasis and cell adhesion molecules (CAMs). By contrast, significant pathways corresponding to underexpressed genes appeared to be responsible for focal adhesion, neuroactive ligand-receptor interaction and arrhythmogenic right ventricular cardiomyopathy (ARVC). However, for the pathways listed which seemed to not be relevant to CTEPH, Path-Net was used to analyze the different pathways to identify the most important ones (Fig. 3). In Path-Net, the MAPK signaling pathway and apoptosis had the largest enrichment degree, which suggests that they are not the most significant variation pathways but participate in pathway regulation. Identifying these pathways

Signal transduction networks in CTEPH.
According to the literature and experimental records in the databases, 440 (276+164) genes appearing in previous 113 (62+51) pathways were collected and a diagram of the gene interaction network was drawn up based on these genes (Fig. 4). The total number of genes in the network was 232, and the particular associations between them are listed in Table VII. In the network, cycle nodes represent genes, and the edges between two nodes represent the interactions between genes, which were quantified by betweenness centrality. Betweenness centrality within the network which contains both the direct regulation by degree and the signal transmitting between the genes represents the size of the cycle node. The higher the betweenness centrality, the more common the gene within the network. The clustering coefficient can be used to estimate the complexity of interactions among genes that neighbor the core gene with the exception of core gene participation. The lower the clustering coefficient, the more independent of the core gene is the interaction among genes in the neighborhood of the core gene. Janus kinase 3 (JAK3), guanine nucleotide binding protein (G protein), alpha 15 (Gq class) (GNA15), mitogen-activated protein kinase 13 (MAPK13), arrestin, beta 2 (ARRB2) and coagulation factor II (thrombin) receptor (F2R) were the five main central genes identified by betweenness centrality.

Discussion
Several clinical and therapeutic factors have been reported as significant to the occurrence of CTEPH. The pathophysiology of CTEPH remains incompletely understood. In most cases it is associated with a history of acute venous thromboembolism (29); however, in a small percentage of patients, thrombi do not resolve after an acute event and the reasons for this are unclear. The current knowledge is based on a triad of enhanced thrombosis (7,30), disturbed thrombolysis (31-33) and inflammation (34). A number of novel prognostic factors, such as cytological features, standard karyotyping, fluorescence in situ hybridization, centromeric probes, single nucleotide polymorphism and gene expression profiling have been investigated.
Using the advanced and inexpensive technique of microarray, new genes which may affect the development of CTEPH can be identified. In this study, to investigate variations in gene expression profiles and signaling pathways in CTEPH, 10 samples were divided into a normal (control) and CTEPH group to identify CTEPH-related differentially expressed genes. The expression of the upregulated genes was higher in the CTEPH group compared with the control group. Gene chips have become a useful tool for studying the development and progression of tumors owing to the high-throughout, but it remains difficult to predict patients with CTEPH, mainly due to the high number of variations in CTEPH and the great challenge in interpreting numerous complex data produced by the microarray (35) and determining the main responsible genes. The present study made use of bioinformatics the method to analyze functions and pathways of the differentially expressed genes, and further clarified their biological significance, and defined the key genes that affect the development of CTEPH.
More than 1,600 genes were differentially upregulated or downregulated in CTEPH in this study. The first upregulated differentially expressed gene in CTEPH, oxidized low density lipoprotein (lectin-like) receptor 1 (OLR1), has been studied in several cardiovascular diseases, such as atherosclerosis for its polymorphisms (36), and recently Wynants et al found that it is highly expressed CTEPH (5). The second most significantly upregulated gene, intereukin (IL)8, has been found to be associated with hemodynamic instability following PEA in patients with CTEPH (37). The role of the third most upregulated gene, secreted phosphoprotein 1 (SPP1), which encodes osteopontin, in CTEPH is poorly understood. Most of the genes that were strongly downregulated showed close associations with tumors, chemokines and lipids, including the gene for CXCL14, which is a type of chemokine ligand involved in the regulation of tumors (38,39). Heparanase 2 (HPSE2) encodes a specific enzyme and is associated with urofacial syndrome (40). None of these genes were found to be associated with CTEPH. As the sample number we used in this study was limited, we tried to investigate the microarray results using another method.  -LgP, negative logarithm of the P-value. Figure 4. Signal transduction networks of CTEPH-related genes. Circles represent genes, red circles represent upregulated genes, and blue circles represent downregulated genes. Arrows represent the activation of (a); straight line represents combinations; dotted line represents indirect effects; a, represents activation; ex, represents gene expression; b, represents binding; c, represents compound; ind, represents indirect effects; inh, represents inhibition; u, represents ubiquination, s, represents state change; detailed annotation listed in Table VII.      Growth hormone receptor  0  3  2  1  Down  ICAM1  Intercellular adhesion molecule 1  0  3  3  0  Up  IL6R  Interleukin 6 receptor  0  3  2  1  Up  IL7R  Interleukin 7 receptor  0  3  2  1  Up  AMPH  Amphiphysin  0  3  3  0  Down  COL6A6  Collagen, type VI, alpha 6  0  3  0  3  Down  IL10RB  Interleukin 10 receptor, beta  0  3  2  1  Up  THBS1  Thrombospondin 1  0  3  0  3  Up  THBS2  Thrombospondin 2  0  3  0  3  Up  THBS4  Thrombospondin 4  0  3  0  3  Down  FGF10  Fibroblast growth factor 10  0  3  0  3  Down  TGFB1  Transforming growth factor, beta 1  0  3  3  0  Up  FGF20  Fibroblast growth factor 20  0  3  0  3  Up  IL11RA  Interleukin 11 receptor, alpha  0  3  2  1  Down  HCST  Hematopoietic cell signal transducer  0  3  0  3  Up  RELN  Reelin  0  3  0  3  Down  COL1A1  Collagen, type I, alpha 1  0  3  0  3  Up  CNTFR  Ciliary neurotrophic factor receptor  0  3  2  1  Down  DOCK2  Dedicator of cytokinesis 2  0  3  3  0  Up  IL12RB1  Interleukin 12 receptor, beta 1  0  3  2  1  Up  ICOS  Inducible T-cell co-stimulator  0  3  0  3  Up  MYLK  Myosin light chain kinase  0  3  3  0  Down  IL21R  Interleukin 21 receptor  0  3   GO is widely recognized as the premier tool for the organization and functional annotation of molecular aspects (41). GO analysis was used to interpret each GO of the differentially expressed genes and analyze it statistically. By using the criteria of P<0.05, significant GO items and genes involved were identified. Guo et al used GO analysis to analyze miRNA microarrays and found that miR-15b and miR-16 may be indispensable for apoptosis by targeting Bcl-2 (42). GO terms regarding inflammatory response play an important role in CTEPH; a number of studies have reported CRP as a predictor of adverse outcome in pulmonary arterial hypertension (43), and IL-6-mediated systemic inflammatory cascades may also be involved in the regulation of peripheral vascular tone following pulmonary thromboendarterectomy (PTE) (44). Quarck et al reported that a proliferative phenotype of pulmonary arterial smooth muscle cells and endothelial cells contributed to proximal vascular remodeling in CTEPH (45). A number of studies have proven that patients with CTEPH can generate a pronounced inflammatory response with the release of pro-inflammatory and anti-inflammatory cytokines (44,46). Therefore, we hypothesized that the functions of other items listed may play a role in CTEPH, which has not been elucidated yet. GO analysis is a classical method used to annotate gene function but is still inexact in some fields. Pathway analysis can reveal the distinct biological process and identify significant pathways that differentially expressed genes participate in; based on this, we can have a comprehensive understanding as to the interactions of genes, functions that they participate in and associations between up-and down-stream pathways, and identify genes involved in these significant pathways. The concordance of the MAPK signaling pathway, cytokinecytokine receptor interaction and apoptosis with the GO terms confirmed their critical role in CTEPH. Wei et al demonstrated that JNK is a critical molecule in 5-HT-induced pulmonary artery smooth muscle cell (PASMC) proliferation and migration and may act at an important point for crosstalk of the MAPK and phosphoinositide 3-kinase (PI3K) pathways (47); however, to our knowledge, there is no study available on the role of the MAPK signaling pathway in CTEPH. Numerous studies have proven that focal adhesion and cytokine participate in the process of vascular remodeling, which is an important characteristic of CTEPH (48,49). A previous study on the role of CRP in proximal pulmonary endothelial cells and smooth muscle cells also demonstrated the effects of cell adhesion molecules on CTEPH (5). Since CTEPH is still a rare disease Insulin-like growth factor binding protein 3 0 1 0 1 Up worldwide, information on the signaling pathways associated with its development is limited. We hypothesized that the other seemingly irrelevant pathways may play a role in CTEPH. However, this requires further investigation. Pathway analysis revealed equally important roles and functions as GO analysis.
In the investigation of genes involved in significant GO terms and pathways, 232 genes in common were found that may affect the development of CTEPH. JAK3 is an enzyme in the janus kinase family and has been implicated in cell signaling processes important in cancer and immune-inflammatory diseases (50). Recently, it was found to improve myocardial vascular permeability (51); however, its role in CTEPH remains unelucidated. The functions of GNA15 and ARRB2 have been less frequently reported. MAPK13 mainly participates in cholangiocarcinoma and increases cell migration; it may play a similar role in CTEPH (52). F2R, also known as PAR-1, affects vascular remodeling in the small intestine (53). Vascular cell adhesion molecule 1 (VCAM1) has been reported to modulate blood vessel endothelial cell-leukocyte interactions and increase the strength of cell adhesion (54). Although their functions have not been fully investigated, a number of genes may play a role in the development of CTEPH. Based on these data, further studies on the expression of these genes and protein functions are required using a greater sample number and using methods, such as reverse transcriptase-polymerase chain reaction and western blot analysis; moreover, the regulatory functions of the identified genes and proteins also requires investigation. Further studies may help to improve the clinical diagnosis and treatment of patients with CTEPH.
In conlcusion, the results presented in our study suggest that differences in gene expression exist between CTEPH and normal samples. These genes encode proteins involved in different GO items and signaling pathways, the disruption of which can affect the development of CTEPH. Several genes, such as JAK3, GNA15, MAPK13, F2R and VCAM1 provide potential candidates for distinguishing between CTEPH from healthy individuals in the future. This distinction may aid in the diagnosis and prevention of CTEPH, based on the different features.