Molecular basis of antibody binding to mucin glycopeptides in lung cancer

Glycopeptides bearing Tn epitopes are emerging targets for cancer diagnosis and immunotherapy. In this study, we analyzed membrane proteins containing O-glycosylated tandem repeat (TR) sequences in lung cancer patients of different types and stages, using gene microarray data in public domain. The expression of Tn and glycopeptide epitopes on the surface of lung cancer cell lines were studied by monoclonal IgG antibodies 14A, 16A, and B72.3. The binding of mAbs to synthetic glycopeptides were studied by surface plasmon resonance. Nine mucin mRNAs were found to be expressed in lung cancer patients but at similar level to healthy individuals. At protein level, a glycopeptide epitope on cancer cell surface is preferably recognized by mAb 16A, as compared to peptide-alone (14A) or sugar-alone epitopes (B72.3). 14A and 16A favor clustered TR containing more than three TR sequences, with 10-fold lower Kd than two consecutive TR. B72.3 preferrably recognized clustered sialyl-Tn displayed on MUC1 but not other O-glycoproteins, with 100-fold stronger binding when MUC1 is transfected as a sugar carrier, while the total sugar epitopes remain unchanged. These findings indicate that clusters of both TR backbones and sugars are essential for mAb binding to mucin glycopeptides. Three rules of antibody binding to mucin glycopeptides at molecular level are presented here: first, the peptide backbone of a glycopeptide is preferentially recognized by B cells through mutations in complementarity determining regions (CDRs) of B cell receptor, and the sugar-binding specificity is acquired through mutations in frame work of heavy chain; secondly, consecutive tandem repeats (TR) of peptides and glycopeptides are preferentially recognized by B cells, which favor clustered TR containing more than three TR sequences; thirdly, certain sugar-specific B cells recognize and accommodate clustered Tn and sialyl-Tn displayed on the surface of a mucin but not other membrane proteins.

Abstract. Glycopeptides bearing Tn epitopes are emerging targets for cancer diagnosis and immunotherapy. In this study, we analyzed membrane proteins containing O-glycosylated tandem repeat (TR) sequences in lung cancer patients of different types and stages, using gene microarray data in public domain. The expression of Tn and glycopeptide epitopes on the surface of lung cancer cell lines were studied by monoclonal IgG antibodies 14A, 16A, and B72.3. The binding of mAbs to synthetic glycopeptides were studied by surface plasmon resonance. Nine mucin mRNAs were found to be expressed in lung cancer patients but at similar level to healthy individuals. At protein level, a glycopeptide epitope on cancer cell surface is preferably recognized by mAb 16A, as compared to peptide-alone (14A) or sugar-alone epitopes (B72.3). 14A and 16A favor clustered TR containing more than three TR sequences, with 10-fold lower Kd than two consecutive TR. B72.3 preferrably recognized clustered sialyl-Tn displayed on MUC1 but not other O-glycoproteins, with 100-fold stronger binding when MUC1 is transfected as a sugar carrier, while the total sugar epitopes remain unchanged. These findings indicate that clusters of both TR backbones and sugars are essential for mAb binding to mucin glycopeptides. Three rules of antibody binding to mucin glycopeptides at molecular level are presented here: first, the peptide backbone of a glycopeptide is preferentially recognized by B cells through mutations in complementarity determining regions (CDRs) of B cell receptor, and the sugar-binding specificity is acquired through mutations in frame work of heavy chain; secondly, consecutive tandem repeats (TR) of peptides and glycopeptides are preferentially recognized by B cells, which favor clustered TR containing more than three TR sequences; thirdly, certain sugar-specific B cells recognize and accommodate clustered Tn and sialyl-Tn displayed on the surface of a mucin but not other membrane proteins.

Introduction
Lung cancer is the leading cause of cancer deaths. Majority (80%) of lung cancer are non-small cell lung cancer, 60% of which are resistant to chemotherapy. Small molecule targeted therapies have been developed for lung cancers carrying epidermal growth factor mutation, but the efficacy has been limited by drug resistance. A breakthrough in lung cancer therapy field is the immunotherapy targeting molecules

Molecular basis of antibody binding to mucin glycopeptides in lung cancer
which suppress the immune surveillance against cancer, as exemplified by blocking antibodies to pD1 molecule, a molecule expressed by tumor-killing lymphocytes which suppresses the lymphocyte activation (1). Anti-pD1 antibody has been approved in the United States for treatment of melanoma. It has also shown clear efficacy in non-small cell lung cancer. Another blocking antibody, anti-PDL1, specific for pD1 ligand 1, which is a molecule expressed by tumor and suppresses immune activation through binding to pD1, has also shown efficacy in treating non-small cell lung cancer (2). Drug targets like pD1 and pDL1 are highly sought after because they are not limited by drug-resistance observed in small molecule targeted therapy. Mucins are cancerous proteins which promote tumor growth through binding to signaling molecules in apoptosis pathway (3,4), such as the binding of MUC1 protein to BH3 domain of BAX protein.
Mucins also bind to galectins expressed on surface of tumorkilling lymphocytes and trigger their apoptosis to subvert immune surveillance (5)(6)(7)(8). The expression of mucins in lung cancer cell lines and tissue sections have been studied by staining with a few monoclonal antibodies (9)(10)(11)(12), however, the big picture of mucins in lung cancer patients is still lacking.
Mucins are highly expressed by healthy epithelial cells, and MUC1 peptide vaccines based on mucin protein backbones have not led to significant objective responses in cancer treatment (13)(14)(15)(16). Abnormally glycosylated mucins in malignant cells are current research focus because of the unique post-translational modification of mucin backbones by carbohydrates. In this study, by analyzing lung cancer microarray data available in public domain and computer predicting of glycopeptide TR sequence, we identified TR glycopeptides bearing Tn and sialyl-Tn. Using MUC1 as a model, we report three disciplines used by monoclonal IgG antibodies to recognize glycopeptide antigens at molecular level.

Materials and methods
mRNA array data source for different types of lung carcinomas. mRNA array data for all lung cancer types were acquired from www.genome.wi.mit.edu/MpR/lung (17). All data were evaluated by an R package, Simpleaffy, using relative log expression (RLE) boxplot and normalized unscaled standard error (NUSE) boxplot as previously described (18). There were 216 cases in total, but four of them were discarded due to their abnormal performance in the quality control process. The 212 remaining array data include the following patients: 150 adenocarcinoma, 20 bronchial carcinoid, 5 small cell lung cancer, 21 squamous lung cancer, and 16 healthy individuals. mRNA data array source for different stages of lung adenocarcinomas. mRNA array data were acquired from caArray (https://array.nci.nih.gov/caarray/project/details. action?project.experiment.publicIdentifier=jacob-00182). The array experiment was performed by multiple laboratories in North America and yielded 442 array data on lung adenocarcinomas (19). Based on RLE and NUSE plot evaluation by Bhattacharjee et al (19), array data from the Dana-Farber Cancer Institute (CAN/CF) is systematically different from the other sites. Thus, we discarded the data from CAN/CF and a few more data with incomplete information regarding cancer stages. In the end, we collected 358 array dataset (132 for T1 stage, 188 for T2 stage, 26 for T3 stage, and 12 for T4 stage in stage category; 241 for N0 stage, 64 for N1 stage, 53 for N2 stage in metastasis stage category).
Analysis of the mRNA quantitation. Array data were processed by Robust Multiarray Average normalization (20).
Data collection of membrane proteins with repeating sequences. XML R package (21, http://www.omegahat.org/ RSXML/) was used to collect sequence and annotation information of human membrane protein with repeating sequence from Uniprot Database (http://www.uniprot.org/). Prediction of glycopeptidome sequences by computational analysis. All calculations were by programs using R (http:// www.R-project.org/). The programs were designed to read the peptide sequence of each mucin as input, with the output as the numbers of all possible GalNAc (Tn) and NeuAcα2,6GalNAc (sialyl Tn) glycosylation patterns.
ELISA measurement of Ab binding to glycopeptides. The biotinylated (glyco)-peptide, RpApGS(GalNAc)TAppAHG-dpEG™11-Biotin, (1 µg/ml) was bound to streptavidin-coated plates (2 µg/ml) and incubated with 16A monoclonal Ab (mAb) for 2 h. Binding of 16A was visualized by a secondary Ab (goat anti-mouse IgG) followed by colorimetric detection. One percent BSA was used as blank for determining the cutoff value. To measure the inhibitory effects of competing ligands, ligands were mixed with the 16A mAb at 0-500 µM for 1 h, before incubation with plate-bound glycopeptide RpApGS(GalNAc)TAppAHG-dpEG.
Surface plasmon resonance (SPR) measurement of Ab binding affinity. SpR measurement of Ab affinity toward consecutive TR peptides were as previously described (22). Interactions of peptides with immobilized 14A and 16A mAbs were determined by using a Biacore T-200 (GE Healthcare, pittsburgh, pA, USA). The 14A and 16A were immobilized on a research-grade, CM5 sensor chip (GE Healthcare) until 5000 RU was reached. Immobilizations were carried out at protein concentrations of 50 µg/ml in 10 mM acetate, pH 5.0 and 10 mM acetate, pH 5.5 for 14A and 16A, respectively, using an amine coupling kit supplied by the manufacturer. In all cases, analyses were carried out at 25˚C in 10 mM Hepes, pH 7.4 containing 150 mM NaCl and 0.005% surfactant p20 at a flow rate of 40 µl/min. The surface was regenerated with 4M MgCl 2 then washed with the running buffer. Data were analyzed with BIA evaluation software (GE Healthcare).

Expression of mucin mRNA in four subtypes of lung cancers.
By analyzing healthy control and cancer patients, we found nine mucins (MUC1, MUC2, MUC3A, MUC4, MUC5AC, MUC5B, MUC6, MUC7, and MUC8) expressed in lung cancer patients (Fig. 1). CD24 and MAGEA3, two well-known lung cancer biomarkers, were used as controls for evaluating the mRNA expression. Fig. 1 shows the expression of cancerous mucins in healthy control, lung adenocarcinoma, bronchial carcinoid cancer, small cell lung cancer, and squamous cell lung cancer. Notably, all mucins found in lung cancer are also found in healthy control.
Expression of mucin mRNA in all stages of lung adenocarcinoma. Fig. 2 shows the expression of the cancer-associated mucins in different stages of lung adenocarcinoma. No significant differences in expression were identified among the various stages.
Mucin glycopeptide TR sequences predicted by R program. we used an R program to predict the possible glycopeptide TR sequences of mucins that may bear Tn and sialyl-Tn antigens; a large number of extremely diverse glycopeptide sequences were predicted for each mucin TR alone. The  glycopeptide sequences predicted for MUC1 TR was published previously (22). TRs of MUC2, MUC3A, MUC3B, MUC5B, MUC6, MUC16, and MUC17 showed more than 10,000 sequence results when bearing one sugar (Tn) alone (data not shown). when the disaccharide sequence (sialyl-Tn) was included in this analysis, more than 100,000 structures were found for such mucin TRs (data not shown).
A glycopeptide epitope on lung cancer cell surface is preferably recognized by mAb 16A. To examine whether MUC1 protein is expressed at protein level, we stained 7 patient-derived lung adenocarcinoma cells with 14A and 16A monoclonal antibodies (22). The 14A antibody only binds to a peptide backbone of MUC1, while the 16A antibody preferentially binds to a glycopeptide of MUC1 (22). The results (Fig. 3) showed that 6 out of 7 human lung adenocarcinoma cell lines could be stained by 14A and 16A antibody, and 16A showed stronger binding in every cell line studied. This suggests that a glycosylated MUC1 epitope, RpApGS(GalNAc)TAppAHG, is better recognized than its non-glycosylated backbone RpApGSTAppAHG.
we also examined the binding of mAb B72.3, which is specific for clustered Tn and sialyl-Tn antigens independent of the peptide backbone sequence (23,24). The binding of human lung adenocarcinoma cells to B72.3 antibody could be observed in 4 of 7 cell lines, suggesting the expression of clustered Tn antigens.
16A mAb binds to both peptide and sugar parts of a glycopeptide. we previously generated a mAb, 16A, that preferentially binds to a MUC1 peptide modified by a Tn residue (22). We also generated a 14A mAb which only binds to the MUC1 peptide backbone. However, the exact role of peptide or sugar part in contributing to the binding of 16A antibody is not clear. To further understand how the 16A mAb binds to the MUC1 glycopeptide, we used peptide, glycopeptide, and sugar structures to inhibit its binding to RpApGS(GalNAc)TAppAHG. The peptide RpApGSTAppAHG inhibits the 16A Ab binding to RpApGS(GalNAc)TAppAHG at a half maximal inhibitory concentration (IC50) of 5.79 µM, while the RpApGS(GalNAc) TAppAHG inhibits at an IC 50 of 2.89 µM (Fig. 4).
In clear contrast, the GalNAc inhibits the binding of 16A Ab to RpApGS(GalNAc)TAppAHG at a much higher IC 50 of 11.13 mM. we also tested polymeric GalNAc attached to a BSA molecule (each BSA carries 23 GalNAc residue); it inhibits the binding to RpApGS(GalNAc)TAppAHG at an IC 50 of 69.25 µM.
Not surprisingly, neither GalNAc nor polymeric GalNAc inhibited the binding of 14A mAb to glycopeptide RpApGS(GalNAc)TAppAHG, indicating that only peptide part is recognized by 14A mAb (data not shown).
14A and 16A mAbs bind to MUC1-106aa polyvalent vaccine with 10-fold higher affinity than consecutive TR sequence. Because IgG molecules bind to bivalent antigen epitopes with higher affinity, we designed consecutive TR sequences, 2014C, 2015C, and 2016C (Table I) and measured the dissociation constant by SpR analysis. Both 2014C and 2016C showed much higher affinity (20-and 40-fold, respectively) binding to 16A and 14A, as compared with the RpApGSTAppAHG single Figure 3. A glycopeptide epitope on lung cancer cell surface is preferably recognized by mAb 16A. Lung adenocarcinoma cell lines, NCI-H1395, HCC4019, H838, H1573, H1703, H2030, and H3255 were studied by flow cytometry staining. Monoclonal antibodies 14A, which binds to MUC1 peptide part only (RpApGSTAppAHG); 16A, which binds to MUC1 glycopeptide RpApGS(GalNAc)TAppAHG; and B72.3, which binds to sugars only (clustered Tn antigen), were used as primary antibodies. Goat anti-mouse IgG (Allophycocyanin-conjugated), and mouse IgG isotype control were from Southern Biotech (Birmingham, AL, USA). TR sequence alone (Table II). Of note, the 2015C did not show stronger binding to 16A or 14A, suggesting the underlined pA sequence RpApGSTAppAHG must be recognized by each arm of IgG molecule.
Trivalent (14) and pentavalent (13) TR sequences for MUC1 were previously designed as cancer vaccines, and IgG responses have been reported in vaccinated patients. The MUC1-106aa showed extremely high affinity to the 14A and 16A mAbs, with a Kd of 0.653 and 1.54 nM, respectively. This strongly suggests that two non-consecutive bivalent TR epitopes are preferably recognized than the two consecutive bivalent TR sequence (Table II).
Transfection of COSMC-deficient cells with MUC1 gene caused 100-fold higher binding to B72.3, a sialyl-Tn-specific mAb. To investigate whether clustered TR sequences also play a role in Abs that bind to Tn and sialy-Tn antigens, we studied B72.3, a mAb that binds to clustered Tn epitopes (23,24). we overexpressed the human MUC1 gene in Ag104 cells, a mouse fibrosarcoma cell line with a known mutation in the COSMC gene (25). Transfection of the human MUC1 gene caused 100-fold (two log) stronger binding to B72.3 mAb, but the binding to Sambucus Nigra Lectin specific to α-2,6 linked sialyl acid (26) remained unchanged, indicating that the increased binding to B72.3 mAb is not caused by the increased synthesis of total sialyl Tn epitopes (Fig. 5). Therefore, MUC1 serves as the preferred backbone to display the sugar epitopes for Ab recognition, 100-fold more efficiently as compared to other membrane proteins.

Discussion
By analyzing mucin expression in lung cancer at transcriptome level, we found expression of nine mucins in both lung cancer and healthy controls (Fig. 1). There is no rationale to use nonglycosylated mucin peptides as cancer vaccines to induce CD8 T cell responses, as no difference exists for the processing of MUC1 proteins in MHC class I pathway by tumor cells versus healthy cells. The value of mucins as diagnostic markers for lung cancer can only be based on their posttranslational  modification, such as the well-known abnormal glycosylation exemplified by Tn, sialyl-Tn, and CA19-9. Future proteomics studies must be focused on the abnormal glycosylation of mucins. A recent breakthrough in this field is the glycopeptidome analysis on a CHO cell line mutant deficient of Core-1 O-glycan elongation, by an electron transfer dissociation (ETD) ionization method (27)(28)(29). we attempted to apply this technology to a Jurkat cell line transfected with the human MUC1 gene; however, this approach failed because mucins are resistant to trypsin digestion, which is critical to generate glycopeptide fragments to be read by mass spectrometry. The use of other proteases may allow us to cleave mucins to short glycopeptides amenable to mass spectrometry analysis.
Analysis of mRNA expression in published databases clearly suggests that MUC1 should be a prioritized mucin for cancer immunological research because it is expressed at high levels in all stages of lung cancer. In light of the extremely diverse structures of the MUC1 glycopeptidome, we designed experiments to answer three major questions of antibody recognition: i) Is the sugar or peptide part of a glycopeptide preferentially recognized by an antibody? when designing MUC1 glycopeptide vaccines for cancer therapy, a main observation in mouse models is that the majority (90%) of Abs induced by glycopeptide vaccination can be absorbed by non-glycosylated peptide backbones (30). Clearly the peptide part of a glycopeptide is more immunogenic than the sugar part. The majority of Abs may be induced toward the nonglycosylated peptide backbone. Alternatively, the majority of the Abs bind to both the peptide part and sugar part, and the peptide binding contributes to most of the affinity.
The inhibition of 16A Ab binding to the glycopeptide by free sugar (GalNAc) clearly shows that the sugar part of the glycopeptide directly binds to the 16A mAb (Fig. 4). This is different from previously reported mAbs SM3 and C595 (31)(32)(33). SM3 binds to an amino acid repeat region (sequence pDTR) of MUC1, which contains a glycosylated  threonine residue. Although X-ray crystallography and NMR studies reveal that glycosylation is not required for binding, the GalNAc O-glycosylation induces conformational changes in the peptide that enhances its interactions with the Ab (31)(32)(33). Similarly, mAb C595 raised against another peptide epitope in MUC1 (sequence RpAp) has enhanced affinity because of conformational changes induced by Tn glycosylation; this affinity is attributed to the stabilizing of a left-handed polyproline II helix by di-or tri-glycosylation of the peptide (33). Other mAbs that are specific for the sugar part only but not the peptide backbone have been reported. For example, B72.3 mAb [23][24] binds to the disaccharide sialyl-Tn antigen but is not dependent on the peptide backbone. Similarly, MSL128 mAb binds to clustered Tn antigens independent of the sequence of peptide backbones (34). Unfortunately, very low titers of sialyl-Tn-specific Abs were induced in patients vaccinated with Theratope vaccine, clustered sialyl-Tn sugars conjugated to KLH (35). The median titer against sialyl-Tn was only 1:300. Sugars are known to be poorly immunogenic. Furthermore, the Tn and sialyl-Tn antigens are self-epitopes that cause deletion of the B cells specific to them at high affinity during B cell development in the bone marrow.
Thus, our findings suggest that B cells preferentially recognize the peptide backbone of a glycopeptide. In our preliminary analysis, we found that 16A mAb binds to the peptide backbone through mutations in complementarity determining region (CDRs) of B cell receptor, while the sugarbinding specificity is further acquired through mutations in frame work of the heavy chain (unpublished data).
ii) Is the consecutive or non-consecutive TR sequence in TR clusters preferentially recognized by B cells? IgGs are bivalent, and IgMs are decavalent. The greater an immunoglobulin's valency (number of antigen-binding sites), the greater the amount of antigen it can bind. Similarly, antigens can demonstrate multivalency because they can bind to more than one Ab. Multimeric interactions between an Ab and an antigen help their stabilization. Mucins that contain clustered TR sequences are ideal backbones that trigger Ab responses. The heavy glycosylation may be a mechanism for mucins to evade self Ab induction, which may cause severe autoimmune diseases. Under cancer conditions, abnormally glycosylated or non-glycosylated TR sequences are exposed, which may trigger strong Ab responses. However, Ab titers toward mucin peptides are not detected in cancer patients by ultra-dense peptide arrays (36). Using a 100 amino acid MUC1 peptide containing five consecutive TR sequences to monitor Ab responses, circulating anti-MUC1 IgG Abs were reported as a favorable prognostic factor for pancreatic cancer (13), and MUC1 vaccine containing 3 consecutive TR sequences have been tested in breast cancer patients (14). Our data (Table II) clearly show that the clustered TR sequences of MUC1 are 300 times more efficient in binding to mAbs 14A and 16A as compared with a single TR repeat. Clustered TR sequences of MUC1 are 10 times more efficient in binding as compared with consecutive TR sequences (2014C and 2016C).
Thus, our data suggest that B cells preferentially recognize consecutive TR sequences, but favor clustered TR sequences containing more than three TR sequences even more. Future attempts at immune-epitope discovery should be focused on designing clustered TR epitopes that contain more than 3 TR sequences.
iii) Is a mucin backbone required for optimal binding of Tn and sialyl-Tn antigens? previous studies of vaccinating patients with the sialyl-Tn Theratope vaccine were based on the assumption that clustered sialyl-Tn epitopes mimic the silyl-Tn epitopes expressed on the surface of a cancer cell. In our study, through comparison of Ag104 cells and Ag104 cells transfected with a human MUC1 gene, we found that the expression of sialyl-Tn antigen alone is not sufficient for the optimal binding to B72.3 Ab, a mAb originally considered as independent of the peptide backbone. Thus, the conformation of the MUC1 backbone, but not other membrane proteins, is clearly essential for optimal B cell recognition of Tn and Sialyl Tn sugars. Future immuno-monitoring must be based on rational design of sugar epitopes displayed on proper protein backbones such as MUC1.
Herein, we report three disciplines used by monoclonal antibodies to bind glycopeptide antigens: i) a glycopeptide epitope on the surface of lung cancer adenocarcinoma cell lines is preferably recognized, as compared to its peptidealone or sugar-alone counterpart; ii) the clustered epitopes with more than 3 TR sequences are preferentially recognized than consecutive TR sequences; iii) sugar epitopes displayed on MUC1 is preferentially recognized as compared to those on other O-glycosylated protein carriers. Our findings may indicate common rules for monoclonal antibody binding to mucin glycopeptides on cancer cell surface, and provide critical clues for designing cancer vaccines and biosimilars targeting mucins.