Novel insulin receptor substrate 1 and 2 variants in breast and colorectal cancer

The insulin/insulin-like growth factor pathway is involved in breast and colorectal cancer (CRC) development. In the present study, we analyzed the coding region and short intron-exon borders of the insulin receptor substrate 1 and 2 (IRS-1 and IRS-2) genes in 12 cell lines derived from breast cancer (BC), 14 cell lines derived from CRC and 33 primary CRCs. The nucleotide variants identified in BC were 3 in IRS-1, 1 of which (p.Arg267Cys) was novel and with a pathogenic potential as predicted by in silico analysis and 6 in IRS-2. Twenty-one variants in IRS-1 and 18 in IRS-2 were identified in the CRC samples. These included 11 novel IRS-1 variants detected exclusively in CRCs, which included 5 missense (p.Pro559Leu, p.Gln655His, p.Asp1014Gly, p.Asp1181His and pPro1203Ser) with a pathogenic potential as predicted by in silico analysis, 2 frameshifts predicted to generate a truncated protein, 1 splice-site mutation and 3 silent variants. In the CRC samples we also identified 7 novel IRS-2 variants, including 4 missense variants, which included 2 (p.Asp782Asn and p.Gly1230Ser) with a pathogenic potential as predicted by in silico analysis, 2 frame insertion mutations and 1 silent variant. Most of the novel IRS-1 and IRS-2 variants may be involved in the modulation of IRS-1 or IRS-2 functions and could be relevant to breast and colorectal tumorigenesis.


Introduction
Insulin, insulin-like growth factor 1 and 2 (IGF-1 and IGF-2) and IGF binding protein (IGFBP) are involved in cell growth and survival and are thought to be implicated in colorectal cancer (CRC). The insulin receptor substrates (IRS) are cytoplasmic signaling adaptor proteins that function as intermediates of the insulin receptor (IR) and IGF-IR (1). In addition, IRS proteins signal downstream of integrin, cytokine and steroid hormone receptors (2,3). By mediating the activities of these receptors, the IRS proteins play a central role in maintaining diverse cellular functions, such as metabolism, motility, survival and proliferation. Four IRS proteins have been described. Considering that IRS-3 is expressed only in rodents (4) and IRS-4 shows limited tissue expression (brain, kidney, thymus and liver) (5), most studies have been focused on IRS-1 and IRS-2, both of which are widely expressed. Tyrosine-phosphorylated IRS-1/-2 bind proteins containing Src homology 2 (SH2) domains, such as the p85 regulatory subunit of the PI3K, the phosphotyrosine phosphatase SHP-2, the Src-like kinases Fyn, Grb-2, NCK, CRK, SHB and others (6). These activate downstream effector cascades, such as the mitogen-activated protein kinase (MAPK) and the PI3K pathways which promote biological responses (6). Irs1 -/mice display glucose intolerance, but do not develop overt diabetes (7). Irs2 -/mice have been shown to develop diabetes as a consequence of decreased β-cell function and insulin resistance (8). Therefore, IRS-1 and IRS-2 possess both similar and distinct properties.
Polymorphisms of IRS-1 (G972R) and IRS-2 (G1057D) have been independently associated with CRC risk (9). Moreover, IRS-1 G972R significantly modifies the risk of developing ovarian cancer in BRCA1 and BRCA2 mutation carriers (10). Our previous results suggest that IRS-1 may influence adenoma formation, CRC progression and liver metastasis (11). Expression of IRS-1 can be directly activated by β-catenin, likely in part via β-catenin/TCF binding to TCF consensus binding elements located in the first intron and downstream of the IRS-1 transcriptional start site (12). Moreover, one study showed that partial or absolute IRS-1 deficiency reduces the tumor load in APC min/+ mice (13). IRS-2 was reported to be amplified in 3 out of 146 primary CRCs (14). Therefore IRS-1 and IRS-2 are most likely implicated in CRC and breast cancer (BC). For these reasons, we analyzed human primary CRC tumors and cell lines for genetic variants in the coding regions of the IRS-1 and IRS-2 genes. IRS-1 and IRS-2 coding regions were also analyzed in BC cell lines.  (15,16) were studied for 2 IRS-1 genetic alterations, c.119delG and c.1791delG. Collection and analysis of samples were approved by the G. d' Annunzio University Ethics Committee.
DNA extraction. DNAs from the cell lines (5x10 6 cells) were isolated by QIAamp DNA Blood Mini kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's protocol. Frozen CRCs were snap-frozen in optimal cutting temperature (OCT) medium. Multiple cryosections from each OCT block were collected onto glass slides and fixed with 70% ethanol. Sections were microdissected, and gDNA was extracted by QIAamp DNA Tissue Mini kit (Qiagen GmbH) according to the manufacturer's protocol and using three 15-µm sections for each tumor. For frozen samples and formalin-fixed CRCs an area with at least 50% neoplastic cells and an area including normal muscularis propria and/or CRC-unaffected mucosa were identified on H&E-stained slides and used to guide manual microdissection for DNA extraction. Serial sections 15-µm thick were prepared for DNA extraction. Selected areas were dissected from de-waxed step-sections by gentle scraping. Scraped tissue was digested by incubation overnight at 56˚C in 100 ml of buffer containing Tris (50 mM pH 8.5), EDTA (1 mM), Tween-20 (0.5%) and proteinase K (20 mg/ml). The extracted DNA was purified with the QIAamp DNA Mini kit following manufacturer's instructions.
Mutational analysis. The coding region and short intron-exon borders of IRS-1 were investigated by Sanger automated sequencing in 12 BC and 14 CRC cell lines using an ABI PRISM ® 310 genetic analyzer (Applied Biosystems, Foster City, CA, USA). In 33 primary CRCs, the entire IRS-1 coding sequence, including intron-exon boundaries, was analyzed by DHPLC using the Wave ® nucleic acid fragment analysis system (Transgenomic, Inc., San Jose, CA, USA) and direct sequencing of the positive samples. The entire IRS-2 coding sequences was analyzed by direct sequencing in all BC and CRC samples. In the controls, the entire IRS-2 coding sequence was analyzed by single-strand conformation polymorphism (SSCP) technique and sequencing. Primers and polymerase chain reaction (PCR) conditions are detailed in Tables I and II. To exclude PCR artifacts, all mutations were confirmed on both DNA strands and in duplicate experiments on separately extracted DNA. Variant nomenclature followed human genome variation society guidelines (http://www.hgvs.org/mutnomen). The cDNA NM_005544.2 and protein NP_005535.1 sequences were used for IRS-1 reference sequence, and the cDNA NM_003749.2 and protein NP_003740.2 sequences for IRS-2 reference sequence. DNA +1 corresponds to the A of the ATG translation initiation codon. MSI analysis of 33 primary CRCs was performed as previously described (15). In silico analysis to assess likely pathogenicity of the variants was performed using PolyPhen (http://genetics.bwh.harvard.edu/pph/) and SIFT (http://sift.jcvi.org/www/SIFT_seq_submit2.html). SIFT scores were classified as intolerant (0.00-0.05), potentially intolerant (0.051-0.10), borderline (0.101-0.20), or tolerant (0.201-1.00) according to the classification proposed by Ng and Henicoff (18) and Xi et al (19).
Phylogenetic conservation. Full length orthologous protein sequences from a range of animal species were extracted from GenBank. We confirmed these as orthologs based on database annotation of identity and/or predicted function, as well as on the requirement that the sequence be the top hit in a BLAST of the human sequence against the genome database for each organism. Human protein sequences were aligned to the following vertebrate orthologs: IRS-   Statistical analysis. Chi-square test 2-tailed was used to calculate all reported P-values using GraphPad v4 software (GraphPad Software, Inc., San Diego, CA, USA). Tables III and IV summarize the frequencies of 3 allelic variants identified in IRS-1. Two common variants, c.702G>C and c.2678G>C, were detected also in the general population (Table IV) (20)(21)(22)(23)(24)(25)(26)(27)(28). The novel amino acid substitution, p.Arg267Cys, identified in MDA-MB-365, occurs in the well-conserved PTB domain, and is probably damaging as determined by in silico analysis performed with PoliPhen and is predicted to affect protein function by SIFT. The p.Arg267Cys substitution has never been described in the general population and type 2 diabetes patients (20-28) and is not described in public databases. The complete coding region and intron/exon boundaries of IRS-2 were investigated by automated sequencing in 11 BC cell lines. Table IV summarizes the frequencies of 6 allelic variants identified in IRS-2 (c.2169C>T, c.2448T>C, c.2487C>T, p.Gly879Ser, pGly882Ala and p.Gly1057Asp), that were also detected in the general population (17,(28)(29)(30) and are described in public databases.

IRS-1 and IRS-2 in CRC.
The coding region and short intron-exon borders of IRS-1 were investigated by automated sequencing in 14 CRC cell lines and by DHPLC and automated sequencing in 33 primary CRCs. Tables V and VII lists the 21 allelic variants identified in IRS-1. The coding region and short intron-exon borders of IRS-2 were investigated by automated sequencing in 12 CRC cell lines and 33 primary CRCs. Tables VI and VII summarize the 18 distinct allelic variants identified in IRS-2. Some of the detected IRS-1 and IRS-2 variants are common polymorphisms also found in the general population (Table VII) (17,(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and are described in public databases.
The novel variants identified in CRC included 11 for IRS-1 (5 amino acid substitutions, 2 frameshifts, 1 splice mutation and 5 silent variants) and 7 for IRS-2 (2 insertions, 4 amino acid substitutions and 1 silent variant) (Tables V and VI). Several of these variants (9 for IRS-1: p.Gln655His, p.Asp1014Gly, p.Asp1181His, p.Pro1203Ser; p.Gly40fs, IVS1+4C>T, c.2766G>A, c.3168C>T, c.3618C>T; 4 for IRS-2: pPro710Ser, p.Asp782Asn, pVal798Ile, pGly1230Ser) were identified in CRC cell lines. One germline IRS-1 variant (pPro559Leu) and a somatic frameshift mutation (p.Gly597fs) were identified in primary CRC cases. These variants were not detected in the control subjects (0/47), despite the fact that IRS-1 has  been extensively analyzed as a candidate gene for type 2 diabetes (20-27) and are not described in public databases. Two in-frame insertion mutations in IRS-2, one germline (p.Ala701_Val702insAla) and the other tumor-associated (p.Asn28_His29insAsn), were identified in CRC cases. These genetic variants were not detected in the control subjects of this study and in previous mutational analyses (17,(28)(29)(30) and are not described in public databases. Overall, nearly 17% of the CRC tested (cell lines and primary CRC cases) had unique missense or a deletion or insertion mutations in IRS-1 and/or IRS-2. These variants are widely dispersed in the coding regions of IRS-1 and IRS-2, but most of the missense variants are predicted to substitute evolutionarily conserved amino acids (Tables III, V and VI).
Microsatellite instability analysis. MSI status was assessed in 33 primary CRCs (15). Two of these showed an MSI-H phenotype. According to publicly available data [http://www. sanger.ac.uk/genetics/CGP/CellLines/ and references (31-33)] 5 of the 14 CRCs cell lines analyzed (DLD1, HCT15, HCT116, LoVo and MIP101) are MSI-H. The IRS-1 nucleotide deletions identified in LoVo (c.119delG) and in an MSI-H primary CRC (c.1791delG) occurred in the contest of coding mononucleotide repeats (5 and 8 G repeats, respectively). Therefore coding IRS-1 repeats could be a target of defective mismatch repair (MMR) in CRC. To test this hypothesis, we analyzed 60 additional CRCs with an MSI-H phenotype for G deletions in the 5 G (c.119delG) and 8 G (c.1791delG) repeats of IRS-1. No mutations were identified in the 5 G repeat, while 5 additional deletions occurred in the 8 G repeat (Table V). Overall deletions in the 8 G repeat of IRS-1 were detected in 9.0% (6/67) of the tested CRCs with an MSI-H phenotype.
In silico analysis of missense variants. PolyPhen (available at http://genetics.bwh.harvard.edu/pph/) was used to predict possible impacts of amino acid substitutions on protein structure and function. Of the 5 novel amino acid substitutions in IRS-1, 2 (pPro559Leu and pPro1203Ser) were scored as probably damaging and 3 (pGln655His, pAsp1014Gly and pAsp1181His) as possibly damaging. Of the 4 novel amino acid substitutions in IRS-2, 2 (pAsp782Asn and pGly1230Ser) were scored as possibly damaging and 2 (pPro710Ser and pVal798Ile) as benign. In an attempt to evaluate the functional relevance of the novel IRS-1 and IRS-2 amino acid substitutions, we employed the SIFT tool. Support for functional significance of the genetic alterations identified in the present study was derived from the analysis of the extent of evolutionary conservation of the altered residues in 11 orthologous IRS-1 and 4 orthologous IRS-2 proteins. The computational analysis carried out at http://www.ebi.ac.uk/ Tools/msa/clustalo/ revealed that 5 out of 5 IRS-1 amino acid substitutions occurred at amino acid residues which were evolutionary conserved in birds and mammals (Table V). Of the IRS-2 amino acid substitutions, 3 out of 4 were conserved in mammals and 1 was not conserved (Table VI).

Discussion
Constitutive activation of IRS-1 has been found in various solid tumors, including BC (34). In vivo overexpression of IRS-1 and IRS-2 in the mammary gland of murine models was found to cause mammary tumorigenesis and metastasis (35), suggesting that IRS-1 and IRS-2 behave as oncogenes in vivo. The Gly972Arg IRS-1 polymorphism has been associated with increased BC risk for BRCA1 class II mutation carriers (10). In the present study, mutational analysis of IRS in BC and CRC identified several variants with pathogenic potential. In the BC cell line MDA-MB-365, we identified a novel variant of IRS-1, p.Arg267Cys. This mutation is located in the well-conserved PTB domain, shows a pathogenic potential by in silico analysis and was observed neither in our controls (1/24 vs. 0/94, P=0.046) nor in public databases. Although in silico analysis predicted a pathogenic potential for p.Arg267Cys, further in vitro and in vivo studies are necessary to assess the functional effect of this mutation. We identified genetic variants of IRS-2 in BC cell lines which were also detected in the general population suggesting that these are common polymorphisms. It was shown that partial or absolute IRS-1 deficiency in mice carrying the APC min/+ mutation reduces intestinal tumorigenesis (13) and that IRS-1 is a β-catenin direct target gene (12). These data suggest that IRS-1 might be a regulator of the initiation of neoplastic transformation by β-catenin. Moreover, the G972R IRS-1 polymorphism has been significantly associated with CRC risk (9). We recently showed that IRS-1 is modulated according to CRC differentiation and we suggested a role for IRS-1 in CRC progression and metastatis (11). Therefore, IRS-1 protein may coordinate signaling pathways involved in CRC development and progression. We identified 11 novel genetic alterations of IRS-1 in CRCs. These mutations were not observed in our controls and were not present in public databases. Two frameshift mutations, c.1791delG and c.119delG, predicted to generate a truncated IRS-1 protein were respectively identified in the LoVo cell line and in a primary CRC, both showing an MSI-H phenotype. The mutations, both in heterozygosity, occurred in the context of 5 and 8 G repeats, respectively. The frequency of these 2 frameshifts was assessed in 67 CRCs with an MSI-H phenotype. The frameshift in the 8 G repeat (c.1791delG) recurred in 6/67 (9.0%) cases, while the frameshift in the 5 G repeat (c.119delG) was detected in 1/67 cases (1.5%). Therefore the 8 G mononucleotide repeat of IRS-1 is an MSI target in MSI-H CRCs. The functional effect of this recurring mutation is not known. It is possible that the truncation activates the oncogenic potential of IRS-1 (36), or alternately that the corresponding allele is inactivated and this also may contribute to the tumor biology. In this regard, we previously found that in mucinous and undifferentiated CRCs, IRS-1 expression was low or absent (11). Moreover, it was previously shown that degradation of IRS-1 in lung cancer cells generated PI3K hyperactivity (37). The novel nucleotide variants identified in the CRC cell lines (p.Gln655His, p.Asp1014Gly, p.Asp1181His, p.Pro1203Ser, IVS1+4C>T, c.2766G>A, c.3168C>T, c.3618C>T) could be germline or somatic or acquired in culture, and thus their role is difficult to assess based on the available data, although the missense variants were determined to be putatively pathogenic by in silico analysis. A novel missense variant (pPro559Leu) identified in a CRC patient was in heterozygosity in both colorectal mucosa and primary CRC, and therefore occurred in the germline. Overall, considering that we identified 11 novel IRS-1 variants in 21/94 alleles and none in the controls (0/94; P<0.0001), mutations in this gene appear to occur at a considerable frequency in CRC.
Overall the CRCs (cell lines and primary CRC cases) were enriched in the IRS-1 nucleotide variants compared to the BC cell lines. There were significant differences in the frequencies of novel IRS-1 variants among the two groups studied, (1/24 in BC vs. 21/94 in CRCs, P=0.021) suggesting an association between IRS-1 variants and CRC.
Several studies have been published concerning the role of IRS-2 in CRC. The G1057D IRS-2 polymorphism has been significantly associated with CRC risk (9). In a previous study (38), we showed that IRS-2 was significantly expressed in the intestinal epithelium, where it localizes at top crypt and is directly controlled by the caudal-related homeobox protein (CDX2). IRS-2 RNA increases with spontaneous differentiation in both HT29 and Caco-2 cells and is downregulated in tumors of Apc Min/+ mice and FAP patients, that serve as models for β-catenin-dependent intestinal tumorigenesis (38). Moreover, the IRS-2 gene was reported to be amplified in 3/146 CRCs (14). We detected novel IRS-2 variants associated with the CRC cell lines (pPro710Ser, p.Asp782Asn, pVal798Ile, pGly1230Ser) and we did not establish whether these are germline, somatic or were acquired in culture. However the p.Asp782Asn and pGly1230Ser IRS-2 missense variants showed a putative pathogenic role by in silico analysis. One novel germline variant (p.Ala701_Val702insAla) was identified in heterozygosity both in the colorectal mucosa and in the primary CRC of one patient. We also detected a tumor-associated mutation (p.Asn28_His29insAsn) in a primary CRC, but not in the matched mucosa. In summary, we showed that IRS-1 and IRS-2 variants occur at a considerable frequency in CRC and BC. The novel mutations identified in the present study are predicted to affect protein function and thus may be involved in the modulation of functions relevant to breast and colorectal tumorigenesis. Further studies with in vitro and in vivo BC and CRC models are necessary to clarify the role of these mutations in tumor biology.