Phylogenetic and structural analysis of the phospholipase A2 gene family in vertebrates

The phospholipase A (PLA)2 family is the most complex gene family of phospholipases and plays a crucial role in a number of physiological activities. However, the phylogenetic background of the PLA2 gene family and the amino acid residues of the PLA2G7 gene following positive selection gene remain undetermined. In this study, we downloaded 49 genomic data sets of PLA from different species, including the human, house mouse, Norway rat, pig, dog, chicken, cattle, African clawed frog, Sumatran orangutan and the zebrafish species. Phylogenetic relationships were determined using the neighbor-joining (NJ), minimum evolution (ME) and maximum parsimony (MP) methods, as well as the Bayesian information criterion. The results were then presented as phylogenetic trees. Positive selection sites were detected using site, branch and branch-site models. These methods led us to the following assumptions: i) closer lineages were observed between PLA2G16 and PLA2G6, PLA2G7 and PLA2G4, PLA2G3 and PLA2G12, as well as among PLA2G10, PLA2G5 and PLA2G15; ii) PLA2G5 appeared to be the origin of the PLA2 family, and PLA2G7 was one of the most evolutionarily distant PLA2 proteins; iii) 16 positive-selection sites were detected and were marked in the PLA2G7 protein sequence as 327D, 257Q, 276G, 34s, 66G, 67C, 319S, 28N, 50S, 54T, 58R, 75T, 88Q, 92R, 179H and 191K.


Introduction
The phospholipase gene family encodes enzymes that hydrolyze phospholipids into fatty acids and other micromolecules. This gene family is classified into four major classes, namely, phospholipase PLA, PLB, PLC and PLD (1), based on the types of catalytic reaction of phospholipids. The majority of coding enzymes play crucial roles in lipid metabolism (2), cell proliferation (3), muscle contraction (4)(5)(6) and in the inflammation process (5). The PLA class includes two subfamilies, namely PLA1 and PLA2. PLA1 cleaves the SN-1 acyl chain and is a major component of snake venom (7,8), whereas PLA2 cleaves the SN-2 acyl chain and releases arachidonic acid, which mediates anti-inflammatory and inflammatory responses (9). The PLA2 gene family is divided into nine groups based on their function: PLA2G3, PLA2G4, PLA2G5, PLA2G6, PLA2G7, PLA2G10, PLA2G12, PLA2G15 and PLA2G16 (10). The coding enzymes of these genes are important in platelet activity (11,12) and B-cell activity (13,14). The dysfunction of one or more of these genes leads to stroke (14) and other neurological diseases (15)(16)(17). However, all the functions of these genes have not yet been fully elucidated.
A number of studies have focused on the association between the PLA2 gene family and various physiological and pathological conditions. The results of a clinical trial demonstrated that high levels of sPLA2 mass in the circulation are not associated with a high risk of cardiovascular disease (18), which is not consistent with some earlier basal medical studies (11). Apart from potential flaws during study design, the differences in the results obtained may be attributed to genetic alterations reflecting a greater sensitivity to cardiovascular risk. In addition, certain studies have performed phylogenetic analyses (19). However, the available information is still insufficient partly due to methodological limitations and gene data inclusion criteria.
In the present study, we analyzed the functions and phylogenetic background of the PLA2 gene family. We aimed to firstly determine the phylogenetic background of the PLA2 gene family in vertebrates using the neighbor-joining (NJ), minimum evolution (ME) and maximum parsimony (MP) methods, as well as the Bayesian information criterion, and secondly, to detect the positive selection sites of the PLA2 gene family to define the structure and biological activity of the gene by site-directed mutagenesis, which may provide possible therapeutic targets. The data presented in this study provide insight into the phylogenetic relationships and func-tional differentiation of the phospholipase and PLA2 gene families.

Materials and methods
Data collection. We searched and downloaded natural and intact amino acid and gene sequences of the phospholipase and PLA2 families from the NCBI database (http://www.ncbi. nlm.nih.gov/gene/). These sequences included human, house mouse, Norway rat, pig, dog, chicken, cattle, African clawed frog, Sumatran orangutan and zebrafish sequences.
Sequence alignment. We used the EBI web tool, MUSCLE (20), to align the sequences of the phospholipase and PLA2 family proteins. Rearranged gene sequences were generated according to the new amino acid alignment. The results of the amino acid alignment were placed in an aligned CDS fasta file using the EMBL web tool, PAL2NAL (21) (http://www.bork.embl. de/pal2nal/), which can form multiple codon alignments from matching amino acid sequences. The format was converted with the use of MEGA4.0. software (22).
Phylogenetic analysis. The full alignment of sequences was used for the phylogenetic analysis. Akaike Information Criterion in PAUP * version 4.0 (23) was applied to evaluate the most appropriate model of amino acid substitution for early tree-building analyses. ML optimizations and distance methods were valued by the PhyML program in PAUP * version 4.0 (24). The most appreciated evolution type, GTR+I+G, was computed for the PLA2 gene family using Modeltest version 3.7 (25). Phylogenetic trees were reconstructed using the Bayesian method from the DNA alignment with the use of MrBayes version 3.1.2 software (26,27) according to the best-fit predictive model. The parameters for tree generation were as follows: 2x10 6 generations of the PLA2 gene family were included with sampling every 1,000 generations, and with four chains (three cold, one heated); the first 250,000 generations (250 trees) were discarded from every run for the two families (phospholipase and PLA2). Analyses with the NJ, ME and MP methods were performed using MEGA4.0. software (22).
Estimation of positive selection sites. Selective pressures of HA and NA genes were detected by CODEML in the PAML package version 4.4 (28). Three codon-based likelihood methods were run as branch, site and branch-site models. P<0.05 was used to determine whether or not the alternative hypothesis was significant. In these analyses, ML estimates of the selection pressure were based on the ratio dN/dS (ω), where dN and dS are the non-synonymous and synonymous substitution rates, respectively, which vary across codons; the probability of each codon being under positive selection was estimated. Positive selection sites can occur in very short episodes or on only a few sites during the evolution of duplicated genes when ω >1 (29). All alignments resulted from the PAL2NAL web tool. The parameter estimates (ω) and likelihood scores were calculated for three pairs of models: M0 (one ratio) vs. M3 (discrete); M1a (nearly neutral) vs. M2a (positive selection); and M7 (β) vs. M8 (β + ω). The likelihood ratio test (LRT) was used to compare the fit to the data of two nested models, assuming that twice the log likelihood differ-ence between the two models (2∆L) follows a χ 2 distribution with a number of degrees of freedom equal to the difference in the number of free parameters (30). Naive empirical Bayes and empirical Bayes selection criteria implemented in PAML4 were used to identify sites under positive selection or relaxed purifying selection in the foreground group with significant LRTs. Each branch group was also labeled as a foreground group. The flow of positive selective site detection is presented in Fig. 1.

Phylogenetic analysis of PLA2 gene family in vertebrates.
A total of 49 sequences from 10 species were used to reconstruct a phylogenetic tree for the PLA2 gene family using the NJ, ME, MP methods, as well as the Bayesian information criterion with bootstrap value detection. The details of the included data are presented in Table I. A total of 25 nodes (56.81% in total) showed bootstrap values ≥95% and 34 nodes (77.27% in total) had bootstrap values ≥80% in the Bayes building tree (Fig. 2D). In each subgroup, mammal data, including data from the Sumatran orangutan, pig, Norway rat, human, house mouse, dog and cattle were gathered. The data from the African clawed frog, chicken and zebrafish were   Phylogenetic tree produced using the NJ method; (B) phylogenetic tree produced using the ME method; (C) phylogenetic tree produced using the MP method. Genes with a crimson disc belong to the PLA2G16 group; genes with an orange disc belong to the PLA2G6 group; genes with a grey disc belong to the PLA2G10 group; genes with a dark blue disc belong to the PLA2G10 group; genes with a yellow disc belong to the PLA2G15 group; genes with a red disc belong to the PLA2G7 group; genes with a green disc belong to the PLA2G4 group; genes with a purple disc belong to the PLA2G3 group; genes with a light blue disc belong to the PLA2G12 group. (D) Phylogenetic tree of the PLA2 gene in vertebrates produced using the Bayesian method. Genes with a crimson branch belong to the PLA2G16 group; genes with an orange branch belong to the PLA2G6 group; genes with a grey branch belong to the PLA2G10 group; genes with a dark blue branch belong to the PLA2G10 group; genes with a yellow branch belong to the PLA2G15 group; genes with a red branch belong to the PLA2G7 group; genes with a green branch belong to the PLA2G4 group; genes with a purple branch belong to the PLA2G3 group; genes with a light blue branch belong to the PLA2G12 group; genes in pink belong to mammals; genes in dark yellow belong to birds; genes in dark green belong to amphibians; genes in sky blue belong to fish. NJ, neighbor-joining; ME, minimum evolution; MP, maximum parsimony. much more original than those from mammals, indicating that the taxonomy of host organisms reflects the phylogenetic background of the PLA2 gene family. The vertebrate PLA2 gene family was sorted into nine lineages according to the type of reaction for catalyzing phospholipids. PLA2G7 seems to be the most distant lineage in this gene family, indicating a large number of structural changes accumulating on them. Furthermore, all the groups were divided into two major clades; clade 1 included PLA2G16, PLA2G6, PLA2G10, PLA2G5 and PLA2G15, whereas clade 2 included PLA2G7, PLA2G4, PLA2G3 and PLA2G12. Closer lineages were observed between PLA2G16 and PLA2G6, PLA2G7 and PLA2G4, PLA2G3 and PLA2G12, well as among PLA2G10, PLA2G5 and PLA2G15. Moreover, the phylogenetic relationships obtained by the NJ, ME and MP methods were different ( Fig. 2A-C).  (Table II). Additional calculations were performed to confirm and supplement the results. The branch model was used for positive branch selection. The free-ratio model was significantly higher than the one-ratio model (2∆lnL=694.2, p=1.306E-93, df=185), indicating heterogeneous selection among branches. Two-ratio models were used using the selected 12 branches; the results revealed that two models (Td and Tf) were significantly different (Pd=3.978E-08, Pf=0.017) at ω>1. Subsequently, branch-site models were used to search for amino acid sites that underwent positive selection in the statistically significant foreground branches Td and Tf (Table III).
Using I-TASSER (32-34) (http://zhanglab.ccmb.med. umich.edu/I-TASSER/), four positive selection sites, 327D, 257Q, 276G and 34s, were located in α-helix; three positive selection sites, 66G, 67C and 319S, were located in β-sheet; and nine positive selection sites, 28N, 50S, 54T, 58R, 75T, 88Q, 92R, 179H and 191K, were located in random coil. All details of the positive selection sites are presented in Table V. A planar structure of all positive selection sites is presented in Fig. 3. Positive selection sites, which were detected by site models are three-dimensionally presented in Fig. 4A and B. Positive selection sites, which were detected by branch and branch-site models, are three-dimensionally presented in Fig. 4C-E.
Distribution of positive selection sites. The functional areas on the AA sequence of PLA2G7_Homo were predicted by PredictProtein. The positive selection site 276G was located in the serine active site, 75T was located in the protein kinase C phosphorylation site and 191K was located near the casein kinase II phosphorylation site.   Selection analysis by branch-site models was performed using codeml implemented in PAML. BS, branch-site; np, number of free parameters; lnL, loglikelihood; LRT, likelihood ratio test; df, degrees of freedom; 2∆lnL, twice the log-likelihood difference of the models compared. BEB, Bayes empirical Bayes approach.

Discussion
Available natural and complete sequences of the phospholipase gene family in humans and the PLA2 gene family of vertebrates from the NCBI database were included in the present study. The phospholipase and PLA2 gene families showed different phylogenetic backgrounds and relationships according to the method used for determination (the NJ, ME, MP methods and the Bayesian information criterion). This difference may be attributed to the weakness of these methods. The NJ method focuses on one final topology with branch length estimates, and the observed differences between sequences are inaccurate reflections of the evolutionary distances (35). The construction of an ME tree is time consuming, and examining all topologies is difficult (36). The MP method lacks statistical consistency and does not guarantee the production of a true tree with high probability, given sufficient data (37). Bayesian analysis, which is widely accepted as the most valuable method in phylogenetic analysis and the estimation of positive selection sites, was also employed (26).
The PLA2 family is the most complex gene family of phospholipases (38,39). The majority of PLA2 genes encode secreted enzymes with physiological features involved in catalyzing platelet activity (40), controlling lipid metabolism (2) and mediating inflammations (5). The dysfunction of these genes may lead to stroke (14).   The PLA2G7 coding enzyme, Lp-PLA2, has attracted considerable attention due to its crucial function in platelet gathering in cardiovascular and cerebrovascular diseases (11). Lp-PLA2 is a new biological marker for detecting vasculitis (41). Unlike multiple clinical trials and diagnostic estimations of Lp-PLA2 mass and activity in the circulation (18), data on the phylogenetic background of the PLA2 gene family and the positive selection of amino acid residues on PLA2G7 genes are limited (42)(43)(44).
According to the PLA2 phylogenetic tree built using the Bayesian information criterion, PLA2G7 is one of the most evolutionarily distant members of PLA2 proteins, an indication of a fast-evolving lineage with numerous structural changes. Moreover, lineage-specific expansion and divergence events were not observed from low-order to high-order vertebrates. The first duplication of the PLA2G7 group led to the emergence of lineages in the Norway rat and the house mouse, and the residual mammals shared duplication with birds, fish and amphibians. Thus, at least two duplications are present in mammals. Moreover, the PLA2G4 family presented the closest lineage to the PLA2G7 family, indicating that PLA2G4 may be another gene that mediates platelet gathering.
In the present study, we identified specific amino acid residues of PLA2G7, which are targets of positive selection. According to the site model result, eight positive selection sites, 28N, 34K, 50S, 54T, 58R, 75T, 88Q and 92R, were found, and eight amino acid sites, 276G, 191K, 327D, 319S, 66G, 67C, 179G and 257Q, were found by the branch and branch-site models. No identical positive selection sites were found among the site, branch and branch-site models.
Functional structure, the protein kinase C phosphorylation site (45), the casein kinase II phosphorylation site (46) and the serine active site (47) were widely scattered along the PLA2G7 peptide chain. The serine active site is a conserved region centered on a serine residue and has the function of catalyzing fatty acid transfer between phosphatidylcholine and cholesterol. According to the Bayesian analysis, a positive selection site, 276G, was located on serine active region, indicating its similar function. It has been previoulsy demonstrated that Lp-PLA2 mediates atherosclerosis by promoting platelet gathering and adherence to vessels (48) and a previous study (49) suggests that, apart from promoting platelet gathering, Lp-PLA2 may also alter cholesterol metabolism in atherosclerosis. Protein kinase C can modify the function of a protein by increasing or decreasing the protein's activity, stabilizing it or marking it for destruction. The positive selection site, 75T, located on the protein kinase C phosphorylation site, indicated its function on altering Lp-PLA2 activity. Casein kinase II is a protein kinase that phosphorylates many different proteins and is relevant to changes in macrophage gene expression during atherosclerosis (50). We found that 191K was located near the casein kinase II phosphorylation site, indicating that Lp-PLA2 may also increase macrophage gene expression in atherosclerosis. However, further validation of such sites is required in order to obtain richer experimental data.
In conclusion, the PLA2 gene family is the most complex gene family among the phospholipases. A number of studies, including clinical trials have focused on the diagnostic estimation of the mass and activity of PLA2 coding enzymes in the circulation (18,51,52); however, phylogenetic analyses of the PLA2 gene family and positive selection amino acid residues on PLA2G7 genes are limited. The present study focused on the phospholipase and PLA2 gene families employing phylogenetic analysis using the NJ, ME and MP methods, as well as the Bayesian information criterion. Positive selection sites were detected for the PLA2 family using site, branch and branch-site models. A total of 49 sequences from 10 different species were selected for the analysis. Phylogenetic analysis of the PLA2 gene family in vertebrates suggests that PLA2G5 is the origin of this gene family, and that PLA2G7 is one of the most evolutionarily distant PLA2 proteins. Eight positive selection sites were detected using the site model, whereas eight positive selection sites were detected using the branch and branch-site models.