Missense mutations of MLH1 and MSH2 genes detected in patients with gastrointestinal cancer are associated with exonic splicing enhancers and silencers

The MLH1 and MSH2 genes in DNA mismatch repair are important in the pathogenesis of gastrointestinal cancer. Recent studies of normal and alternative splicing suggest that the deleterious effects of missense mutations may in fact be splicing-related when they are located in exonic splicing enhancers (ESEs) or exonic splicing silencers (ESSs). In this study, we used ESE-finder and FAS-ESS software to analyze the potential ESE/ESS motifs of the 114 missense mutations detected in the two genes in East Asian gastrointestinal cancer patients. In addition, we used the SIFT tool to functionally analyze these mutations. The amount of the ESE losses (68) was 51.1% higher than the ESE gains (45) of all the mutations. However, the amount of the ESS gains (27) was 107.7% higher than the ESS losses (13). In total, 56 (49.1%) mutations possessed a potential exonic splicing regulator (ESR) error. Eighty-one mutations (71.1%) were predicted to be deleterious with a lower tolerance index as detected by the Sorting Intolerant from Tolerant (SIFT) tool. Among these, 38 (33.3%) mutations were predicted to be functionally deleterious and possess one potential ESR error, while 18 (15.8%) mutations were predicted to be functionally deleterious and exhibit two potential ESR errors. These may be more likely to affect exon splicing. Our results indicated that there is a strong correlation between missense mutations in MLH1 and MSH2 genes detected in East Asian gastrointestinal cancer patients and ESR motifs. In order to correctly understand the molecular nature of mutations, splicing patterns should be compared between wild-type and mutant samples.


Introduction
The incidence and mortality rates of gastric cancer and colorectal cancer are among the highest among malignant tumors in East Asia. Germline mutations of mismatch repair (MMR) genes are responsible for the majority of hereditary nonpolyposis colorectal cancer (HNPCC) cases. The MMR genes MSH2 (OMIM No. 609309) and MLH1 (OMIM No. 120436) are considered to be the two major genes implicated in HNPCC (1,2). Carriers of germline mutations of MSH2 and MLH1 genes have a 4-fold greater risk of gastric cancer compared with normal individuals, as well as a high risk of colorectal cancer. These two genes are associated with gastrointestinal cancer susceptibility.
Missense mutations are among the most common types of mutations underlying inherited human diseases. The deleterious effects of missense mutations are usually attributed to their effects on the structure or function of a protein.
The assumption may be misleading, as the mutations that affect the sequences that are important for splicing modulation are likely to have a profound effect on the translated product. It has become increasingly clear that exonic point mutations located outside the splice sites may affect pre-mRNA splicing and thereby cause disease (3,4). Correct pre-mRNA splicing not only requires that the splice site sequences are present at the exon-intron borders, but is also critically dependent on additional intronic and exonic regulatory sequences (5). Those present in exons and with the capacity of enhancing splicing are called exonic splicing enhancers (ESEs) and those with the capacity of inhibiting the splicing are the exonic splicing silencers (ESSs). Generally, these classes of elements are called exonic splicing regulators (ESRs). Consequently, mutations located in ESE or ESS elements may affect splicing. The significance and prevalence of this phenomenon may have been significantly underestimated, as the majority of studies of disease-related genes are limited to the analysis of genomic DNA.
The majority of enhancer sequences within exons have been found to bind members of the serine/arginine-rich (SR) protein family, while many silencing elements are bound by members of the heterogeneous nuclear ribonuclearprotein (hnRNP) family (6). ESEs are discrete, degenerate motifs of Missense mutations of MLH1 and MSH2 genes detected in patients with gastrointestinal cancer are associated with exonic splicing enhancers and silencers 6-8 nts located inside exons (7,8). The study of normal splicing suggests that the majority of exons contain at least one functional ESE site (7,9). ESE-bound SR proteins promote exon definition by directly recruiting and stabilizing the splicing machinery through protein-protein interactions (10), and/or by antagonizing the function of nearby silencer elements (11). The cores of ESSs are considered to be relatively short (6-10 nts). ESS-bound hnRNPs are proposed to mediate silencing through direct antagonism of the splicing machinery or by direct competition for overlapping enhancer binding sites. The intrinsic strength by which the splice sites are recognized by the spliceosome, as well as the antagonistic dynamics of proteins binding ESEs and ESSs, control a large proportion of exon recognition and alternative splicing. Therefore, exonic splicing regulatory sequences are now increasingly recognized as a major target and a common mechanism for disease-causing mutations leading to exon skipping in functionally diverse genes.
In this study, we used ESE-finder (12,13) and FAS-ESS (14) software to analyze the missense mutations of MSH2 and MLH1 genes detected in East Asian gastrointestinal cancer patients, and to assess whether these mutations hit the predicted ESE/ESS motifs and affected gene splicing.

Subjects.
A total of 114 missense mutations, 52 of MSH2 and 62 of MLH1, detected in the gastrointestinal cancer patients, were serially collected for this study from published East Asian literature   (Table I). The majority of the investigated mutations were exclusively reported in East Asia (China, Japan and Korea), and some of the mutations were detected in different ethnicities. The study was approved by the Ethics Committee of Nanjing University, Nanjing, China.
Potential ESE motif analysis. To identify the ESE motifs that were recognized by individual SR proteins, a PCR-based systematic evolution of ligands by exponential enrichment (SELEX) was used. During this approach, a natural splicing enhancer in a minigene was replaced by short, random sequences derived from an oligonucleotide library. The generated pool of minigenes was transfected into cultured cells, and spliced mRNAs were amplified by RT-PCR and sequenced (7). On the basis of the frequencies of the individual nucleotides at each position, a score matrix for each nucleotide in each position was calculated. This score matrix may be used to predict SR protein-specific ESEs (ESE-finder) (12,13).
We analyzed wild-type or mutant exon sequences from MLH1 and MSH2 genes in Table I with ESE-finder software using SR protein score matrices and threshold values, essentially as described previously (ESE-finder: http://rulai. cshl.edu/tools/ESE/) (12). Sequence motifs for the same or different SR proteins may overlap. We considered only the wild type or mutant sequence motifs with scores greater than or equal to the value of the threshold for the corresponding SR protein. The threshold values were as follows: SF2/ASF (IgM-BRCA1) heptamer motif, 1.867; SC35 octamer motif, 2.383; SRp40 heptamer motif, 2.670 and SRp55 hexamer motif, 2.676.
Potential ESS motif analysis. To systematically identify ESS motifs, an in vivo splicing reporter system was developed to screen a library of random decanucleotides. The resulting library was transfected into cultured human 293 cells, and stably transfected cells were combined and sorted for GFP-expressing cells by fluorescence activated cell sorting (FACS) analysis. The fluorescence-activated screen for exonic splicing silencers (FAS-ESS, or FAS for short) yielded 176 ESS hexamers (FAS-hex2 set) (14).
SIFT analysis. Sorting Intolerant from Tolerant (SIFT) tool (accessible at http://sift.jcvi.org/) was applied to detect deleterious missense mutations (45,46). SIFT compiles a dataset of functionally linked protein sequences by searching the protein database using a PSI-BLAST algorithm. Subsequently, it builds an alignment from the homologous sequences with the query sequence and scans all positions in the alignment, as well as calculating the probabilities for amino acids at that position. The substitution at each position with normalized probabilities of a tolerance index or SIFT score of <0.05 are predicted to be deleterious or intolerant, while those ≥0.05 are predicted to be tolerant (45). In this study, reference sequence (RefSeq) ID or GenInfo Identifier (GI) number and substitutions were provided as inputs to the SIFT blink program (46). A total of 52 missense mutations in the MSH2 gene (GI: 4557761) and 62 in the MLH1 gene (GI, 463989) were analyzed for identification of deleterious variants.

Potential ESE/ESS analysis of the mutations in MLH1
and MSH2 genes. We analyzed wild-type or mutant exon sequences from MLH1 and MSH2 genes in Table I using SR protein score matrices and threshold values, essentially as described. Potential ESE motifs found in the mutations in the two genes are listed in Table I  We analyzed wild-type or mutant exon sequences from MLH1 and MSH2 genes in Table I with FAS-ESS using the FAS-hex2 set (176 ESS hexamers), essentially as described previously. Potential ESS motifs found in the mutations in the two genes are listed in Table I (Fig. 2). Of the 114 mutations assessed, 9 (7.9%) mutations resulted in 13 ESS motif scores being eliminated. However, 17 (14.9%) mutations created 27 ESS motif scores.
Eliminating the potential ESE motif and creating the potential ESS motif have the same effect on exon exclusion. We named these mutations as potential ESR error mutations. In total, 56 (49.1%) mutations possessed a potential ESR error (Table II). Table I. Pathogenic missense mutations analyzed in the splicing assay.   Deleterious missense mutations predicted by the SIFT server. Eighty-one missense mutations (71.1%) were predicted to be deleterious with a tolerance index <0.05; the lower the tolerance score, the greater the functional consequence an amino acid residue substitution is expected to have (Table I). Additionally, 32 (28.1%) mutations were predicted to be deleterious with potential ESE motif losses, while 13 (11.4%) mutations were predicted to be deleterious with potential ESS motif gains. In total, 38 (33.3%) mutations were predicted to be deleterious with potential ESR errors (Table II).

Discussion
The incidence and mortality rates of gastrointestinal cancer are among the highest malignant tumors in East Asia. MMR genes MLH1 and MSH2 have been known to play an important role in the pathogenesis of gastrointestinal cancer. At present, the majority of databases contain annotation data that are primarily or exclusively derived from genomic DNA analysis, and the effect of a mutation on the mRNA or on the encoded protein is usually predicted from the primary sequence, rather than by experimentally determining the mRNA expression and splicing patterns. Therefore, the majority of reported disease-associated alleles of these genes are small insertions, deletions or splice-site mutations that result in protein truncation. Thus, only a small number of amino acid substitutions in either gene have been described as deleterious missense mutations, yet a very large number of different unclassified variant alleles are routinely encountered in clinical and research laboratories. It is therefore necessary to functionally define these unclassified variants as deleterious alleles, low-penetrance alleles or benign polymorphisms. In this study, we selected 114 missense mutations of MSH2 and MLH1 genes detected from East Asian gastrointestinal cancer patients in published studies. The ethnic group included Chinese, Japanese and Korean individuals. The missense mutations contribute to certain forms of cancer susceptibility in East Asian populations, but it was unclear whether these were the definite pathogenic mutations in gastrointestinal cancer.
The consequences of splicing unclassified variants found in the MLH1 or the MSH2 genes may be studied directly at the patient RNA level. However, the number of variants that may be tested for splicing alterations using patient RNA is limited by the difficulty of obtaining blood samples suitable for RNA extraction. The bioinformatic tools, the ESE-finder and FAS-ESS, may enable prediction of the splicing defect of the mutations. These tools have already been used successfully to predict ESEs/ESSs and their disruption in a variety of genes, including ACF, BRCA1, BRCA2, FBN1, IGF1, PDHA1,  SMN1, SMN2, TNFRSF5, CFTR, MlH1, MSH2, Tp53, MCAD  and others (3,7,47-52). Auclair et al conducted a systematic RNA screening of a series of 60 western patients who carried unrelated exonic or intronic mutations in the MLH1 or MSH2 genes (53). In addition, it was found that the potential correlation between aberrant splicing and prediction of ESE by the ESE-finder demonstrated a sensitivity of 80% and a specificity of 42%.
Under the conditions of the null hypothesis, there is no correlation between ESEs and mutations; the amount of ESE motif scores eliminated or created should be equal. However, in the present study, the amount of ESE losses (68) was 51.1% higher than ESE gains (45). This suggested that the mutations loaded in the potential ESE motifs were more likely to eliminate the ESE motif score, and that they affected gene splicing. Additionally, under the conditions of the null hypothesis, there is no correlation between ESSs and mutations; the amount of ESS motif scores eliminated or created should be equal. Conversely, in the present study, the amount of ESS gains (27) was 107.7% higher than the amount of ESS losses (13). This suggested that the mutations were more likely to create the ESS motif score and that they affected gene splicing, indicating that there is a strong association between missense mutations in MLH1 and MSH2 genes and ESE/ESS motifs. Some of the mutations should be splicing-related deleterious alleles.
As an upper limit for the estimate of the proportion of ESR-related mutations, we suggest that 56 (49.1%) mutations, which have lost ESE or gained ESS motifs, were deleterious for the reason that they disturbed functional splicing enhancers or or created functional splicing silencers, respectively. This approach was likely to overestimate the proportion of ESR-related pathogenic mutations. This is due to the fact that not all ESR motifs are true functional ESRs, and not all nucleotide substitutions in functional ESRs disturb their function.  According to previous studies, no extensive functional analysis was available for these mutations. We used the SIFT tool to functionally analyze the missense mutations. SIFT is a program that predicts the effect of amino acid substitutions on protein function, on the basis of sequence conservation during evolution and the nature of the amino acids substituted in a gene of interest. In total, 81 missense mutations (71.1%) were predicted to be deleterious with a tolerance index <0.05. Among them, 38 (33.3%) mutations were predicted to be deleterious and have at least one potential ESR error. Some of these may be pathogenic with exon exclusion.
Eliminating and creating the potential ESE motif has the same effect on exon exclusion. One mutation may eliminate one or more potential ESE motifs. The greater the number of potential ESE motifs eliminated, the more likely the mutation was to affect the ESE motifs. However, one mutation may create one or more potential ESS motifs. The greater the number of potential ESS motifs created, the more likely the mutation was to affect the ESS motifs. In total, 25 (21.9%) mutations eliminated at least two potential ESE motifs, or created at least two potential ESS motifs, or eliminated one or more potential ESE motifs and created one or more potential ESS motifs. These may be more likely to affect exon splicing. Among these, 18  2059C>T, c.2263A>G in MLH1, were predicted to be deleterious in the SIFT analysis. These were the mutations that most likely affected exon splicing, and were denoted as ESR-relevant mutations. We proposed that some of these disrupted functional ESEs or created functional ESSs, leading to the creation of a misspliced message predicted to encode a truncated, non-functional protein. However, these data did not allow us to determine which of the SR protein/hnRNPs motifs were functional. Although it is unlikely that each motif was able to be recognized simultaneously, due to the overlap between them, it is possible that each motif was important in a different cell type, depending on the relative expression levels of SR protein/hnRNPs. Several putative ESR sequences have been found in exons where they have been sought systematically, raising the possibility of functional redundancy. This may diminish the potential exon-skipping effect of a mutation in any one ESR. However, in cases where 3-10 putative ESE sequences occur within a single exon, a single ESE-disrupting base substitution may lead to efficient exon skipping. Fackenthal (56). This suggests that, at least in certain cases, individual ESRs may be critical for splicing even when other ESRs are present in the same exon. However, the splice mutations of MLH1 and MSH2 have been underestimated. The strong correlation between missense mutations with splicing enhancer/silencer motifs found in this study also suggested that splicing-related mutations in the two genes may be relatively common. The computer predictions do not always correlate with in vivo splicing defects. The predictable ESR error mutations require experimental analysis for validation in a further study.
In conclusion, our results indicated that there is a strong correlation between missense mutations in MLH1 and MSH2 genes detected in East Asian gastrointestinal cancer patients and ESR motifs. In total, 38 (33.3%) mutations were predicted to be functionally deleterious and possess one potential ESR error, while 18 (15.8%) mutations were predicted to be functionally deleterious with two potential ESR errors. These may be more likely to affect exon splicing. To truly understand the molecular nature of mutations, splicing patterns should be compared between wild-type and mutant samples.