A novel system for predicting the toxicity of irinotecan based on statistical pattern recognition with UGT1A genotypes

To predict precisely severe toxicity of irinotecan, we evaluated the association of UGT1A variants, haplotypes and the combination of UGT1A genotypes to severe toxicity of irinotecan. UGT1A1*6 (211G>A), UGT1A1*28 (TA6>TA7), UGT1A1*60 (−3279T>G), UGT1A7 (387T>G), UGT1A7 (622T>C), and UGT1A9*1b (−118T9>T10, also named *22) were genotyped in 123 patients with metastatic colorectal cancer who had received irinotecan-based chemotherapy. Among the 123 patients, 73 were enrolled in either of two phase II studies of the FOLFIRI (leucovorin, 5-fluorouracil and irinotecan) regimen; these patients constituted the training population, which was used to construct the predicting system. The other 50 patients constituted the validation population; these 50 patients either had participated in a phase II study of irinotecan/5′-deoxy-5-fluorouridine or were among consecutive patients who received FOLFIRI therapy. This prediction system used sequential forward floating selection based on statistical pattern recognition using UGT1A genotypes, gender and age. Several UGT1A genotypes [UGT1A1*6, UGT1A7 (387T>G), UGT1A7 (622T>C) and UGT1A9*1b] were associated with the irinotecan toxicity. Among the haplotypes, haplotype-I (UGT1A1: −3279T, TA6, 211G; UGT1A7: 387T, 622T; UGT1A9: T10) and haplotype-II (UGT1A1: −3279T, TA6, 211A; UGT1A7: 387G, 622C; UGT1A9: T9) were also associated with irinotecan toxicity. Furthermore, our new system for predicting the risk of irinotecan toxicity was 83.9% accurate with the training population and 72.1% accurate with the validation population. Our novel prediction system using statistical pattern recognition depend on genotypes in UGT1A, age and gender; moreover, it showed high predictive performance even though the treatment regimens differed among the training and validation patients.

To predict the risk of irinotecan toxicity for individual patients, it is important that determining the relative contributions of UGT1A variants other than UGT1A1*28 and UGT1A1*6 is important to the development of any system designed to predict irinotecan toxicity for individual patients because patients without UGT1A1*28 or *6 do experience severe irinotecan toxicity. Several studies have examined associations between irinotecan toxicity and UGT1A haplotypes in addition to each genotype of UGT1A (17)(18)(19). however, determining the haplotype or diplotype for each patient is difficult; moreover, most haplotypes and diplotypes are too rare to constitute a group large enough for meaningful statistical analysis. Moreover, gender and age of patients each reportedly have an impact on irinotecan toxicity (20)(21)(22). hence, these factors should be also taken into consideration when developing a system designed to predict irinotecan toxicity.
The aim of this study was to evaluate whether the combinations of UGT1A genotypes, but not haplotypes, together with patient characteristics might be useful in predicting the risk to patients with mCRC treated of irinotecan-containing regimens. here, we investigated the genotypes of 123 patients at six loci: UGT1A1*6 (211G>A, rs4148323), UGT1A1*28 (TA 6 >TA 7 , rs8175347), UGT1A1*60 (-3279T>G, rs4124874), UGT1A7 (387T>G, rs17868323), UGT1A7 (622T>C, rs11692021), and UGT1A9*1b (-118T 9 >T 10 , rs35426722, also called UGT1A9*22) (23). Next, we evaluated the contribution of each UGT1A genotype, haplotype, and diplotype to the risk of irinotecan toxicity. Furthermore, we developed a new system for predicting the risk that a patient will experience irinotecan toxicity; this system uses sequential forward floating selection (SFFS) algorithm based on statistical pattern recognition to select the combinations of UGT1A genotypes, gender and age. SFFS is a sequential search method characterized by a dynamically changing number of features included or eliminated at each step of an individual analysis (24). This is the first study conducted to assess the role of the combination of genotypes at six polymorphic sites in UGT1A and clinical features constructed by SFFS on the risk of irinotecan toxicity.

Materials and methods
Patients. In this study, 123 mCRC patients were examined for association between UGT1A genotypes and irinotecan toxicity (Table I). This study was performed as an ancillary investigation; data collected from three prospective studies [FLIGhT1 (5), FLIGhT2 (5) and FRUTIRI (6)] and from consecutive patients who received FOLFIRI at the department of digestive Surgery and Surgical Oncology, Yamaguchi University Graduate School of Medicine, Japan. Each participant received irinotecan at the dose of 150 mg/m 2 , which has been approved in Japan.
F L IGh T1 ( U M I N 0 0 0 0 02388) a nd F L IGh T 2 (UMIN000002476) were phase II studies of first line and second line chemotherapy, respectively, for mCRC. Study designs and key eligibility and exclusion criteria have been described in detail (5,25,26). Briefly, each regimen consisted of irinotecan on day 1 +400 mg/m 2 fluorouracil bolus followed by 2,400 mg/m 2 fluorouracil continuous infusion during 46 h + 200 mg/m 2 leucovorin on day 1 every 2 weeks. Of all patients from the FLIGhT1 and FLIGhT2 studies, 38 and 35, respectively, participated in this ancillary investigation and use; these 73 patients constituted the training population. FLIGhT1 or FLIGhT2 patients homozygous for UGT1A1*28 were excluded from the training population because these patients received a lower starting dose of irinotecan (100 mg/m 2 ) (5).
The validation population comprised 50 patients from two different study groups: 22 patients who participated in FRUTIRI (UMIN000005011), a phase II study of a combination therapy comprised irinotecan and 5'-deoxy-5-fluorouridine (5'-dFUR) (6) and 28 consecutive patients who underwent second-line FOLFILI treatment between October, 2008 and July, 2012 in the department of digestive Surgery and Surgical Oncology, Yamaguchi University Graduate School of Medicine, Japan. detail treatment regimen tested in FRUTIRI was described previously (6). Briefly, irinotecan was administered every two weeks, and 400 mg 5'-dFUR was administered every week orally twice a day on five consecutive days that were followed by a weekly 2-day washout. The 28 consecutive patients undergoing FOLFIRI treatment were following the protocol used in FLIGhT2 (26). In a validation population, patients with UGT1A1*28 homozygous were not found in the FRUTIRI study (n=28). Additionally, patients heterozygous for UGT1A1*28 (n=6) were excluded from the FRUTIRI study because these patients received lower starting dose of irinotecan 70 mg/m 2 . Among the 28 consecutive patients who received second-line FOLFILI therapy, homozygous for UGT1A1*6 or *28 and those compound heterozygous for UGT1A1*6 and UGT1A1*28 been excluded from this ancillary study. The training (n=73) and validation (n=50) populations did not differ significantly with regard to the distribution of any clinical feature or genotype that is listed in Table I except for the distributions of the UGT1A7 (387T>G) and UGT1A9*1b alleles (data not shown).
In this study, we defined patients who exhibited hematologic toxicity greater than grade 3 during the entire course of therapy as experiencing irinotecan toxicity. The study protocols were approved by the Institutional Review board at Yamaguchi University Graduate School of Medicine, and were carried out in accordance with the helsinki declaration on experimentation on human subjects. Each patient gave written, informed consent for their participation in this study.
Genotyping of UGT1A and haplotype construction. A conventional sodium iodide (NaI) method was used to extract genomic dNA from peripheral blood samples (27). The number of TA repeats in the UGT1A1 promoter region was determined by the fragment size analysis followed by direct sequencing as described previously (4). The TaqMan technique with a hydrolysis probe was used to determine the UGT1A1*6 genotype as described previously (28); similarly, hydrolysis probes were used to determine the genotypes at UGT1A1*60; a direct sequencing method was also used to determine the genotypes at UGT1A7 (387T>G and 622T>C) and UGT1A9*1b.
Each nucleotide variant was evaluated to determine whether it was in hardy-weinberg equilibrium; haploview 4.2 software was used to perform the linkage disequilibrium (Ld) and case-control haplotype analyses (29). Lewontin's coefficient D' and correlation coefficient r 2 were calculated as measures of Ld.
Construction of toxicity prediction system by genotype combinations. To predict severe toxicities of irinotecan, the age, the gender and a comprehensive 6-site UGT1A genotype were determined for each of the 73 patients in the training population. SFFS, a method of statistical pattern recognition, was then used to determine the optimal genotype combinations for predicting the risk of irinotecan toxicity. The statistical pattern recognition, SFFS, identified the genotype combinations with the 'maximum number of cases' and 'maximum prediction rate' to maximize overall diagnostic accuracy (24). Briefly, the algorithm of the SFFS used in this study was as follows: i) Suppose that at stage k we have a set of X 1 , …, X k of sizes 1 to k, respectively. ii) Let the corresponding values of the feature selection criteria be J 1 to J k , where J i = J(X i ), for the feature selection criterion J(.). iii) Let the total set of features be X. Then at the kth stage of the SFFS procedure follow these steps: Step 1, select the feature x j from X-X k that increases the value of J to the greatest degree and add it to the current set: Step 2, find the feature x r in the current set X (k + 1) that reduces the value of J the least; if this feature is the same as x j then set J (k + 1) = J(X (k + 1) ); increment k; go to step 1; otherwise Table I. Characteristics of the patients. remove it from the set to from X' k = X (k + 1)x r .
Statistical analysis. Fisher's exact test was used to assess the relationship between toxicity and each UGT1A variant. The Cochran-Armitage trend test was used to examine the linearity of the relationship between UGT1A genotypes and irinotecan toxicity. SPSS Statics 17.0 software (IbM, Tokyo, Japan) and R version 2.13.0 software were used to perform the calculations (30). p<0.05 was considered statistically significant. Performances of the toxicity prediction system by genotype combination. To construct a system for predicting the risk of severe irinotecan toxicity, genetic data from 73 patients that constituted the training population were analyzed exhaustively; specifically, SFFS was used to assess gender, age and the individual genotypes at six polymorphic UGT1A sites (Fig. 2). In addition to the three possible genotypes (wild-type homozygous, heterozygous, variant homozygous), a fourth option for each site (designated 'unspecified genotype') was included into the algorithm. Similarly, patient gender (male, female, regardless of gender) and age (≤60, >60 years old, regardless of age) were assessed. The cutoff value for age (60 years) was determined by Youden index obtained by the receiver operating characteristic (ROC) curve analysis with the training population. Among possible combinations (4 6 x 3 2 -1 = 36,863), the following cases were excluded: cases not found, single cases, and cases that represented positive or negative predictive values <80%. In order to optimize the combinations,       were used with sequential floating forward selection (SFFS) for statistical pattern recognition as described in Materials and methods. homozygosity for alleles associated with irinotecan toxicity, heterozygosity and homozygosity for alleles not associated with irinotecan toxicity are indicated by red, blue and green cells, respectively. ** The un-specified categories (regardless of genotypes, gender or age). categorization according to predictive value and exclusion of redundant combinations in each category were performed. As a result, 8 combinations (P-I to P-VIII, Fig. 1A) appeared to predict an increased risk of toxicity, and 10 combinations (N-I to N-x, Fig. 1b) appeared to predict a lack of toxicity. The system for predicting irinotecan toxicity based on combinations of 8 factors (6 genotypes, gender and age) was generated using data from of all 73 patients in the training population. The system was then applied to data from 84.9 and 86.0% of the patients in the training and validation populations, respectively (Table V). This prediction system showed 83.9% accuracy (positive predictive value, 86.4%; negative predictive value, 82.5%) for the training population (n=62) and 72.1% accuracy (positive predictive value, 70.0%; negative predictive value, 72.7%) for the validation population (n=43). when patients who were not applied to the combinations were included, the performance of the system was 71.2% accuracy (sensitivity, 55.9%; specificity, 84.6%) in training population (n=73) and 62.0% accuracy (sensitivity, 41.2%; specificity, 72.7%) in validation population (n=50). Odds ratios of positive prediction for irinotecan toxicity for this prediction system were 8.0 (95% CI, 1.5-42.5) and 16

Discussion
The novel system for predicting severe irinotecan toxicity described here was based on genotypes at 6 polymorphic sites in UGT1A and 2 basic clinical features; notably, it showed high predictive performance even though the treatment regimens differed among the training and validation patients (Tables V  and VI). The odds ratio of positive prediction for severe irinotecan toxicity was higher for this prediction system than for that of any other haplotype or for that of any genotype (Table VI). The performance of this prediction system was reduced from the 83.9% accuracy seen with applied patients to this system in the training population to 72.1% accuracy in the validation. with regard to positive prediction, the inconsistency in accuracy between training and validation populations was seen when the combinations included the UGT1A9*1b site and patient age (P-II, VI and VII in Fig. 2). The frequencies of UGT1A9*1b genotype differed between the training and validation populations; moreover, the UGT1A9*1b alleles were not in hardy-weinberg equilibrium in the validation population (data not shown). The cutoff value for patient age (60 years old) was determined by a ROC curve generated with data from the training population; however, previous studies used a cutoff age of 65 years (20,21). Indeed, one patient without toxicity, but predicted as presence of toxicity in this system, was aged 63 years. Some genotypic combinations decreased the performance of negative prediction for sever irinotecan toxicity in the validation population relative to the training population (N-II, IV, and V in Fig. 2). Specifically, 36.4% (n=4/11) of patients in training population with a combined genotype that included heterozygous for UGT1A1*28 alleles and UGT1A1*6 (-/-) experienced severe irinotecan toxicity, but 66.7% (n=4/6) of the patients in validation population with the same genotype combinations (UGT1A1*6, -/-and UGT1A1*28, -/+) showed severe toxicity. Of the 73 patients in the training population and the 50 in the validation population, 11 (15.1%) and 7 (14.0%), respectively, were matched with neither of the combination in our prediction system. Interestingly, the incidence of severe toxicity among patients who were not matched with either combination identified by this prediction system was 72.7% (training population) and 14.3% (validation population) (Table VI). Therefore, the frequency of the irinotecan toxicity among patients who do not have any combination of UGT1A variants identified by this novel prediction system might be due to factors other than UGT1A polymorphisms.
here, as in previous studies, each identified UGT1A haplotypes was useful for precisely predicting the presence or absence of severe irinotecan toxicity (14,18,(38)(39)(40). Consistent with our study, Cecchin et al reported that a haplotype comprising UGT1A1*28 (-), UGT1A1*60 (-), UGT1A7 (387T and 622T), and UGT1A9*1b (+) was a predictor of severe hematologic toxicity during the entire course of therapy (18). however, determining the haplotypes for any one patient is a difficult clinical measurement. Therefore, the genotypes at each of the 6 sites (rather than the haplotype or diplotype) could be used for clinical assessments.
Our prediction system depend not only on UGT1A genotypes but also on patient gender and age. Previous studies showed that patient gender and age were related to the risk of irinotecan toxicity (20)(21)(22). In the training population, patient age was associated with severe irinotecan toxicity, but patient gender was not (Table IV). Interestingly, when patient age, patient gender or both the patient age and gender were excluded from the factors used by the prediction system, the     number of patients that matched with the prediction system decreased, although the system maintained the high positive and negative predictive values (data not shown). The SFFS algorithm could be modified to include other factors (e.g., mutations in the tumor, patients' clinical characteristics, additional genetic variants, etc.) to improve the prediction performance. Such modifications may result in a system that could meaningfully predict clinical outcomes, including tumor response. Recent advances in technology for sequencing whole genomes of individuals may lead to substantial increases in information that might be useful for personalized therapy. However, such complicated information could not be efficiently or fully utilized in the currently available formats. SFFS could easily construct a system that can utilize huge data sets such as whole-genome sequences. Our strategy for developing SFFSbased systems for clinical use could serve as a powerful tool for advancing personalized therapy, although additional prospective study of this prediction system is needed.