Evaluation of parameters in mixed male DNA profiles for the Identifiler® multiplex system

The analysis of complex DNA mixtures is challenging for forensic DNA testing. Accurate and sensitive methods for profiling these samples are urgently required. In this study, we developed 11 groups of mixed male DNA samples (n=297) with scientific validation of D-value [>95% of D-values ≤0.1 with average peak height (APH) of the active alleles ≤2,500 rfu]. A strong linear correlation was detected between the peak height (PH) and peak area (PA) in the curve fit using the least squares method (P<2e-16). The Kruskal-Wallis rank-sum test revealed significant differences in the heterozygote balance ratio (Hb) at 16 short tandem repeat (STR) loci (P=0.0063) and 9 mixed gradients (P=0.02257). Locally weighted regression fitting of APH and Hb (inflection point at APH = 1,250 rfu) showed 92.74% of Hb >0.6 with the APH ≥1,250. The variation of Hb distribution in the different STR loci suggested the different forensic efficiencies of these loci. Allelic drop-out (ADO) correlated with the APH and mixed gradient. All ADOs had an APH of <1,000 rfu, and the number of ADO increased when the APH of mixed DNA profiles gradually decreased. These results strongly suggest that calibration parameters should be introduced to correct the deviation in the APH at each STR locus during the analysis of mixed DNA samples.


Introduction
In forensic analysis, mixed DNA samples are composed of genetic material from more than one donor, and complex DNA mixtures involve 3 or more individuals (1,2). During criminal investigations, specimens of blood, semen, secreted fluids, excretions and epithelial cell samples are often mixed, particularly in cases of vaginal rape, anal rape and oral sodomy. Complex mixtures of DNA, such as mixed semen collected in cases of gang rape or mixed blood samples in homicide cases, are the most challenging to analyze (3)(4)(5)(6)(7). The contemporary analysis of mixed DNA samples often yields low detection rates that are not useful in criminal investigations. Some results collected using these methods do not meet the legal standards of relevant court systems (2). Therefore, accurate and sensitive methods for profiling mixed DNA samples are urgently required in forensic DNA testing.
Mixed DNA evidence from semen and vaginal secretions are commonly submitted for laboratory analysis in sexual assault cases. The DNA Commission of the International Society of Forensic Genetics (ISFG) published standardized unrestricted and restricted combinatorial methods for interpreting mixed male-female DNA profiles in 2006 (2). In general, unrestricted combinatorial approaches involve probability analyses of allelic bands on a short tandem repeat (STR) locus based on the separation of allele peaks by size, discounting stochastic effects that lead to the substantial imbalance of two alleles at a given heterozygous locus (2). As a result, quantitative information, such as peak height (PH) and peak area (PA) are not considered in the calculation of likelihood ratios (LRs), which limits the accuracy of these methods.
For mixed male-female DNA, forensic sex-typing is generally conducted with commercial STR kits that apply the primers suggested in the study by Sullivan et al and the sextyping marker, amelogenin (AMEL-) (38). Results produce characteristic male X and Y chromosome peaks that are easily distinguished from the single female X chromosome peak, although anomalous results have been reported due to abnormalities, such as primer binding-site mutations and chromosomal deletions (39). In this DNA profile, the intensity of the X peak is much higher than that of the Y peak, indicating heterozygote imbalance (i.e., H b <0.6) (2). A number of studies have applied this methodology to identify the M x of malefemale individuals (8)(9)(10)(11)14,37). Notably, complex mixtures involving mixed male DNA, such as mixed semen, all present 2 X/Y peaks, causing heterozygote balance (H b >0.6) that prohibits the estimation of M x between DNA components using the AMEL locus. Therefore, the establishment of a more reliable and accurate method for the interpretation of mixed male DNA requires careful parameter selection and evaluation.
In the current study, the data distribution and statistical analysis of each parameter were carried out specifically for mixed male DNA. Using this experimental model, a relative fluorescence intensity (unit: rfu) range that provides the optimal distribution of D-value, H b and allelic dropout (ADO) according to average PHs (APHs) was determined. Mixed male DNA profiles should first meet these 'rfu' levels to ensure the accuracy and reliability interpretation. It should then be determined whether calibration parameters are required to correct the APH deviation during mixed DNA analysis.

Materials and methods
DNA sample collection. Forty anti-coagulated blood samples collected from unrelated healthy males (5 ml each) were supplied by the Blood Center of Hebei Province, Shijiazhuang, China.
Experimental design. For our study purposes, DNA was extracted from the 40 whole blood samples, and quantified by the Applied Biosystems (Foster City, CA, USA) 7500 real-time PCR system (Life Technologies Inc., Carlsbad, CA, USA). Single DNA samples with concentration differences of <0.5 ng/µl were paired to construct a simulated mixed male DNA experimental model of 2 individuals. Gradient ratios between 2 individual DNA samples were achieved by adjusting the dilution and volume of each sample that was added to the model. In addition, the concentration of the simulated mixed DNA stock solutions were all adjusted to the desired levels within the working solution concentration range of 0.5-1.25 ng/µl, so as to meet the requirements for the DNA template of the testing kit. DNA quantification. DNA quantification was performed using the Quantifiler ® Human DNA Quantification kit (Life Technologies Inc.) containing DNA standard solution (200 ng/µl), Quantifiler Human Primer mix, and Quantifiler PCR Reaction Mix. Human Primer Mix (10.5 µl/sample) and PCR Reaction Mix (12.5 µl/sample) were mixed and then dispensed into reaction wells (23 µl each) followed by the addition of 2 µl of sample or standard to each well, to obtain a 25-µl PCR reaction system. DNA quantification was repeated 3 times for each sample, and the mean result served as the final DNA concentration.
Identifiler ® PCR and electrophoresis. With 25 µl PCR system (containing 10.5 µl of PCR reaction mix, 5.5 µl of Identifiler ® Primer Set, 0.5 µl of Gold ® DNA polymerase, 9.0 µl of nuclease-free water and 1 µl of DNA template), Identifiler ® PCR amplification was performed as follows: pre-denaturation at 95̊C for 11 min, followed by 28 cycles of denaturation at 94̊C for 1 min, annealing at 59̊C for 1 min and extension at 72̊C for 1 min, and a final extension step at 60̊C for 60 min. The PCR products were then examined using a 10-µl electrophoresis system containing 0.25 µl of GeneScan™ 500 LIZ ® Size Standard, 9.25 µl of Hi-DiTM formamide and 0.5 µl of PCR product orthe AmpFlSTR ® Identifiler ® allelic ladder. Capillary electrophoresis was performed on an ABI3130xl Genetic Analyzer (Applied Biosystems).
Parameters of mixed DNA profiles APH/average PA (APA) of the active alleles. In a mixed DNA profile obtained by STR analysis, the height of the y-axis corresponding to the band of an allele is termed as the PH. The area surrounded by the x-axis and the peak outline is termed as the PA. Both allele PH and PA are expressed as relative fluorescence intensity (unit: rfu). APH or average PA (APA) is defined as the mean of PHs or PAs from all loci (excluding drop-out) in a DNA profile. The parameters, such as M x , H b and stutter ratio fluctuate with alterations in APH or APA. APH or APA often serves as a quantitative parameter to evaluate the distribution regularity in mixed DNA analysis. Analysis of H b . The H b is calculated as the ratio between PHs (measured as fluorescent intensities φ, units: rfu) of the lower peak and higher peak in the same locus, H b = φ a /φ b . In the case that all DNA template amounts are >500 pg and not degraded, H b >0.6, defined as heterozygote balance, usually indicates that both alleles originate from one the heterozygote of one individual. H b <0.6 is defined as heterozygote imbalance, indicating that the alleles are from different individuals.
Given that the H b cannot be estimated in the locus with ADO, the H b is only estimated in loci without drop-out and allele sharing. The H b of the mixed DNA profile is only calculated with genotypes of AB:CD or AB:CC without allele sharing. Two H b values (H b of AB and CD) for type AB:CD, and one H b value (H b of AB) for type AB:CC, as well as the corresponding numbers of APH values can be calculated.
Analysis of ADO. The low level of a specific DNA content may cause relative fluorescence intensity which is too low, and which cannota be separated from the background, and therefore results in the loss of an allelic peak, presenting a false homozygote. ADO can be of a single allele, two alleles, and low-copy-number mixed DNA allele. ADO is primarily caused by a very low PH value. The total number of drop-out alleles in a DNA profile correlates with the APH of a DNA sample. The APH is defined as the mean APH of loci without drop-out in a DNA profile.
Statistical analysis. Locally weighted polynomial regression is a non-parametric regression method. It does not require the hypothesis of data distribution. Instead it describes the relationship of variables according to the morphology of the data. This method is more robust than the conventional least squares regression model.
Kernel density estimation is a non-parametric method to estimate a density curve driven by data distribution. The Kruskal-Wallis rank sum test is also a non-parametric test that does not depend on normal distribution of data. These provide more reliable results when analyzing non-normal distributed data compared to variance analysis.
All graphics were made using the R (version 3.0.1) software package ggplot2 (version 0.9.3).  Table I. A total of 9 hybrid gradients were designed in each group of mixed male DNA samples, in triplicate for each gradient, resulting in a total of 297 samples (Table II).

Establishment
DNA quantity of mixed male DNA. The DNA quantity of the 297 simulated mixed male DNA samples was examined by assessing the selected samples using the ABI 7500 real-time PCR system. The quantification of each sample was repeated 3 times, and the mean values were taken as the DNA concentration (Table III).
Given that the concentration of the DNA template recommended by the Identifiler kit used in this study was 0.5-1.25 ng/ µl, 99 mixed male DNA working solutions (11 groups of mixed DNA at 9 hybrid gradients) were diluted accordingly. Each 2-µl aliquot was diluted by 10-or 15-fold with nucleasefree water (Ambion). The 9948 and 2800M DNA standards with a concentration of <0.5 ng/µl were left undiluted. Additionally, the volume of DNA template for the NAN11 mixed DNA samples (n=27) corresponding to the malemale DNA standards was 2 and 1 µl for the other groups, including single DNA samples used for mixed male DNA PCR system (Table IV).   The correlation between the APH and D-value of the mixed DNA samples is presented in Fig. 1B. A similar tendency in D-value distribution was observed in the 9 mixed gradients. A D-value >0.2 was found only in the gradients of 1:2, 1:5, 1:6 and 1:9, while most D-values were ≤0.1 with an APH ≤2,500 rfu. These results demonstrate a minor error between the measured and theoretical M x value of each locus, suggesting that the mixed male DNA experimental models meet the requirements of scientific and rational mixed DNA analysis.

D-value analysis of the experimental model.
Correlation between PH and PA. The correlation between PH and PA in the mixed DNA profile is shown in Fig. 2A. For which R 2 = 0.9588 and the P-value was <2e-16, indicating a strong linear correlation between the PH and PA. For the 16 STR loci analyzed (Fig. 2B), loci D19S433, D3S1358, D58S18 and D8S1179 presented a relatively weaker linear correlation between PH and PA. The other 12 STR loci showed strong linear correlation between PH and PA. These results are basically consistent with the conclusion drawn in the study by Tvedebrink et al (40) that there is a strong linear correlation between PH and PA. Generally, for all 16 STR loci, linear correlations between PH and PA were detected. Therefore both PH and PA may be used for the quantitative analysis of mixed DNA samples, without apparent differences in efficacy.

Correlation between APH and H b .
After the locus with drop-out was excluded, 2,535 H b values and the corresponding APH values were calculated in the 297 mixed male DNA profiles. Both the H b and APH showed skewed distribution (Fig. 3).
Tables V and VI show the percentages of H b >0.6 and >0.7 at 16 STR loci in the 9 mixed gradients. Fig. 4 illustrates the data distribution of H b at each locus and mixed gradient, in which the red dotted lines indicate H b = 0.7 and =0.9. Allele sharing at loci D3S1358 and AMEL resulted in the lack of H b estimation; therefore, the statistical analysis of the H b values was only carried out for 14 STR loci. It was found that the median H b was higher at loci TPOX, TH01 and D21S11, with most H b ≥ 0.9. The median H b was lowest at locus D5S818. H b distribution fluctuated greatly at loci D5S818 and D2S1338. There was little difference in the median H b distribution for    various mixed gradients (apart from gradient 1:8 and 1:9). The Kruskal-Wallis rank sum test revealed significant differences in the distribution of H b at 16 STR loci (P=0.0063) and at 9 mixed gradients (P=0.02257). Fig. 5A presents the correlation between the 2,535 H b values and the corresponding APH in the mixed male DNA profiles. The blue solid line was plotted by using the locally weighted regression, while the grey region indicates the corresponding confidential interval. An inflection point was presented at APH = 1,250 rfu in the curve fitted by non-parametric regression. When the APH was <1,250 rfu, the H b value varied between 0.75 and 0.87. In cases when the APH was ≥1,250 rfu, the H b value was almost stable. The green dotted line indicates the mean H b value of 0.878 corresponding to APH >1,250 rfu. It was found that 92.66% of the H b values were >0.6 and 83.04% of the H b values were >0.7. When the APH was ≥ 1,250 rfu, 92.74% of the H b values were >0.6. Fig. 5B and C show the changing tendency of the APH and H b at the 9 mixed gradients and 16 loci fitted by the locally weighted regression, in which the 2 red dotted lines indicate H b = 0.6 and = 0.7. Fig. 5B illustrates that >90% of the H b values were >0.6 at gradients from 1:1 to 1:8, while 82% of the H b values were >0.6 at gradient 1:9. The percentage of high H b value and high APH was greater at mixed gradients of 1:2, 1:6, 1:7 and 1:8 than the other gradients. Fig. 5C illustrates that >90% of the H b values were >0.6 at loci apart from D2S1338 and D5S818; moreover, the percentage of high H b value and high APH was greater at loci CSF1PO, D19S433, D21S11, D2S1338 and vWA than the other loci, while APH was almost concentrated at <2,500 rfu at other loci.       gradients showed that few ADOs were observed at gradients from 1:1 to 1:3, while the number of ADOs increased sharply at gradients from 1:7 to 1:9 (Fig. 6). These results demonstrated that the number of ADO correlated with the M x value in the mixed DNA profiles. The incidence of dropout would be greatly increased in an extremely imbalanced gradient (e.g., 1:7-1:9). Fig. 7 shows that many ADOs were present at gradients from 1:5 to 1:9, with a corresponding APH of <1,000 rfu (Fig. 7, left panel), while Fig. 7 (right panel) shows a wide coverage of the APH without drop-out, with the highest median detected. The gross tendency appeared to be a gradual drop in APH is accompanied by an increase in the number of ADOs.

Discussion
The parameters analysis of this experimental model revealed a close linear correlation between PH and PA, 2 quantita-  -out alleles  0  1  2  3  4  5  6  7  8  9  10  12  13  No. of samples  245  19  8  5  1  2  2  3  3  3  2   tive parameters of mixed DNA analysis. These results are in agreement with the conclusion drawn in the study by Tvedebrink et al (40). The Kruskal-Wallis rank sum test revealed differences in the H b distribution at 16 STR loci and the 9 mixed gradients. The changing tendency in APH and H b fitted by locally weighted regression showed a difference in the H b distribution at various STR loci, suggesting different efficiencies of these loci in the mixed DNA analysis. ADO correlated with both APH and mixed gradient, and all APH drop-out values were <1,000 rfu. All results indicated that APH affects H b and drop-out distribution, and H b correlates with the STR locus and mixed gradients. Further studies are required to investigate the causes responsible for the variation in the forensic efficiency of Identifiler ® STR loci, including the different fluorescence sensitivity of genetic analyzer, which may cause APH distortion; and parameter analysis of the interlocus balance (C i ) on various STR loci, to reduce the bias in mixed DNA analysis. Although stringent criteria are not generally necessary for single-DNA testing, our findings suggest that mixed STR profiles analyses should meet certain rfu levels in order to ensure the accuracy and reliability of the interpretation. The results from our study suggest that the forensic efficiency of the STR multiplex we used should be firstly evaluated in mixed DNA analysis, and calibration parameters should be introduced to correct the APH deviation of the STR loci during mixed DNA analysis.