Open Access

Application of mixsep software package: Performance verification of male-mixed DNA analysis

  • Authors:
    • Na Hu
    • Bin Cong
    • Tao Gao
    • Yu Chen
    • Junyi Shen
    • Shujin Li
    • Chunling Ma
  • View Affiliations

  • Published online on: April 30, 2015     https://doi.org/10.3892/mmr.2015.3710
  • Pages: 2431-2442
  • Copyright: © Hu et al. This is an open access article distributed under the terms of Creative Commons Attribution License [CC BY_NC 3.0].

Metrics: Total Views: 0 (Spandidos Publications: | PMC Statistics: )
Total PDF Downloads: 0 (Spandidos Publications: | PMC Statistics: )


Abstract

An experimental model of male-mixed DNA (n=297) was constructed according to the mixed DNA construction principle. This comprised the use of the Applied Biosystems (ABI) 7500 quantitative polymerase chain reaction system, with scientific validation of mixture proportion (Mx; root‑mean‑square error ≤0.02). Statistical analysis was performed on locus separation accuracy using mixsep, a DNA mixture separation R‑package, and the analytical performance of mixsep was assessed by examining the data distribution pattern of different mixed gradients, short tandem repeat (STR) loci and mixed DNA types. The results showed that locus separation accuracy had a negative linear correlation with the mixed gradient (R2=‑0.7121). With increasing mixed gradient imbalance, locus separation accuracy first increased and then decreased, with the highest value detected at a gradient of 1:3 (≥90%). The mixed gradient, which is the theoretical Mx, was one of the primary factors that influenced the success of mixed DNA analysis. Among the 16 STR loci detected by Identifiler®, the separation accuracy was relatively high (>88%) for loci D5S818, D8S1179 and FGA, whereas the median separation accuracy value was lowest for the D7S820 locus. STR loci with relatively large numbers of allelic drop‑out (ADO; >15) were all located in the yellow and red channels, including loci D18S51, D19S433, FGA, TPOX and vWA. These five loci featured low allele peak heights, which was consistent with the low sensitivity of the ABI 3130xl Genetic Analyzer to yellow and red fluorescence. The locus separation accuracy of the mixsep package was substantially different with and without the inclusion of ADO loci; inclusion of ADO significantly reduced the analytical performance of the mixsep package, which was consistent with the lack of an ADO functional module in this software. The present study demonstrated that the mixsep software had a number of advantages and was recommended for analysis of mixed DNA. This software was easy to operate and produced understandable results with a degree of controllability.

Introduction

The R programming language, which was created as a branch of the S language in the 1980s, is widely used in the field of statistics. R is a free open source software environment that is part of the Gnu’s Not Unix project. As an implementation of the S programming language, R has a complete software system for data processing, statistical computing and graphics functions (1). The primary functions of R include data storage and processing systems. It also has an array of operation tools (among which vector and matrix operations are particularly powerful functions), statistical analysis tools, statistical graphics functions and a simple powerful programming language function, which can control data input and output in order to achieve branch and cycling.

The source code for R is freely downloadable and compiled executable files are available online. R is available for multiple computer platforms, including UNIX (FreeBSD and Linux), Windows and MacOS. R predominantly runs through commands and a number of versions of the graphical user interface have been developed, among which Rstudio is the most commonly used (http://www.rstudio.com) (2). In addition, the Comprehensive R Archive Network (CRAN; http://cran.r-project.org) provides a collection of downloadable executable file version source codes and documentations for R, as well as various software packages written by R users. There are >100 CRAN mirrors worldwide, which are responsible for shunting the primary R server. There are five CRAN mirrors in China, allowing Chinese users to quickly download the R-package.

In bioinformatics, the R language is commonly used for the analysis of molecular biological data. The Bioconductor project (3), which uses R as a genome analysis tool, has been available since its launch in 2001 and is updated twice per year (http://www.bioconductor.org). At present, the Bioconductor project is used in bioinformatics analysis of high-throughput data, microarray data and sequential data, with a large number of metadata packages for pathways, microarrays, genetic markers and organs (311). The purpose of the Bioconductor project is to provide powerful statistical analysis and graphics functions for genomic data analysis in order to efficiently analyze metadata in various species and to provide a common platform for bioinformatics.

The mixsep package (1215) is a DNA mixture separator in R, which is developed and maintained by Dr Torben Tvedebrink (Aalborg University, Aalborg, Denmark). This software is a forensic genetics tool used for the analysis of mixed DNA. The present study used the mixsep version 0.2.1-2, updated on May 3, 2013. The user interface of the present version is shown in Fig. 1; URL, http://cran.r-project.org/web/packages/mixsep/index.html; reference manual, http://cran.r-project.org/web/packages/mixsep/mixsep.pdf (12). The mixsep package constructs a statistical model of a greedy algorithm (13) that separates and infers the majority of two-person mixed DNA profiles (separation results are often not unique) on the premise that it does not consider the influence of allelic drop-out (ADO; the low level of a specific DNA content, which may cause relative fluorescence that is too low and may not be separated from the background, therefore providing results in the loss of allelic peak, expressing a false homozygote), stutter and drop-in (DNA contamination), and then conducts the individual identification of mixed DNA. The mixsep package also includes a module for use in complex mixed DNA analysis (more than three people), which has shown limited analytical performance in experimental data validation.

Materials and methods

DNA sample collection

Anti-coagulated blood samples (5 ml) were collected from 40 unrelated healthy males at the Blood Center of Hebei Province (Shijiazhuang, China).

Experimental design

DNA was extracted from each of the 40 whole blood samples and quantified using the ABI 7500 quantitative polymerase chain reaction (qPCR) system (Life Technologies Inc., Carlsbad, CA, USA). Single DNA samples were classified according to whether there were minimal differences in DNA concentrations (<0.5 ng/µl) and then used to generate simulated male-mixed DNA samples of two individuals. This approach allowed the preparation of different mixed DNA gradients by adjusting the volume of DNA solution. To avoid potential over-fitting in statistical analysis caused by single sample type and inadequate sample size, various combinations of mixed DNA samples were generated using different sources (individuals), and each of these combinations was prepared in multiple mixed gradients. This procedure ensured that the influence of mixed DNA profiles and mixed gradients was objectively reflected in the analytical performance of the mixsep software. In addition, the concentration of simulated mixed DNA stock solutions was adjusted to desired levels within the range of 0.5–1.25 ng/µl (that is, the working solution concentration), to achieve the DNA template quantity required by the DNA testing kits.

Establishment of male-mixed DNA model
DNA extraction

DNA was extracted from 40 whole blood samples using an Invitrogen® PureLink™ Genomic DNA Mini kit (Life Technologies Inc.). Aliquots (20 µl) of the 40 DNA extracts (nos. 1-40) were diluted by adding nine volumes (180 µl) of Ambion® Nuclease-Free Water (Life Technologies Inc.) to obtain 10-fold dilutions of the DNA solutions (final volume, 200 µl). The Promega® stock solutions 9948 Male DNA and 2800M control DNA standards (10 ng/µl; 25 µl; Promega, Corp., Madison, WI, USA) were added with nine volumes (225 µl) of Ambion® Nuclease-Free Water, to obtain 10-fold dilutions of the standard samples (final volume, 250 µl).

DNA quantification

DNA quantification was performed using the Quantifiler® Human DNA Quantification kit (Life Technologies Inc.), containing DNA standard (200 ng/µl), Human Primer mix, and PCR Reaction mix. Human Primer mix (10.5 µl/sample) and PCR Reaction mix (12.5 µl/sample) were mixed and dispensed into reaction wells (23 µl) followed by the addition of 2 µl sample or standard to each well, in order to obtain a 25-µl PCR reaction mixture. DNA quantification was repeated three times for each sample, and the mean of these was taken as the final DNA concentration.

Principles of mixed DNA preparation

Simulated male-mixed DNA was prepared by classifying DNA quantification results of the 40 male samples (nos. 1-40) and the Promega Male-DNA standard; the classification criterion was that single DNA samples have similar concentrations (difference, ≤0.5 ng/µl). The prepared, simulated male-mixed DNA was quantified by ABI 7500 real-time PCR system (Applied Biosystems). The concentration of DNA templates was adjusted to 0.5–1.25 ng/µl as recommended in the instructions for the AmpFlSTR® Identifiler® PCR Amplification kit and the simulated mixed DNA was further diluted whenever necessary.

Identifiler PCR and electrophoresis

The 25-µl PCR system contained 10.5 µl PCR Reaction mix, 5.5 µl Identifiler Primer set, 0.5 µl Gold® DNA Polymerase, 9.0 µl Nuclease-Free Water and 1 µl template DNA. Identifiler PCR amplification was performed according to the following conditions: Pre-denaturation at 95°C for 11 min, 28 cycles of denaturation at 94°C for 1 min, annealing at 59°C for 1 min, extension at 72°C for 1 min and a final extension step at 60°C for 60 min. The AmpFlSTR® Identifiler (Life Technologies) PCR products were checked using a 10-µl electrophoresis system containing 0.25 µl GeneScan™, 500 LIZ® Size Standard, 9.25 µl Hi-Di™ formamide and 0.50 µl of PCR product or Allelic Ladder. Capillary electrophoresis was performed on an ABI 3130xl Genetic Analyzer (Applied Biosystems Life Technologies, Foster City, CA, USA). All PCR reagents were purchased from Invitrogen Life Technologies Inc. (Carlsbad, CA, USA).

Software operation of mixsep
Rationale for use

According to the required significance level for statistical analysis, the mixsep package provided the optimal and alternative genotype combinations of short tandem repeat (STR) loci, estimated the parameter of mixture proportion (Mx), fitted the residual peak area error and calculated goodness of fit. Additionally, the mixsep package screened out and removed STR loci with poor goodness of fit, which contributed to the overall variance.

Package downloading and installation

The mixsep package for windows was obtained at http://cran.r-project.org/bin/windows/base/release.htm. Installation was accomplished by following the instructions or running the command ‘install.packages’ (‘mixsep’, repo = ‘http://mirrors.ustc.edu.cn/CRAN/’). Mixsep was loaded by running the command ‘library (mixsep)’.

Data formatting and loading

Experimental data were saved as a CSV file containing six variables. These were: Locus, allele, height, area, bp and dye. In the majority of cases, data analysis was performed using the first four of these, as shown in Fig. 2. Data were loaded as a CSV file by clicking ‘Add file’.

Variables and genetic marker selection

The variables of locus and allele were required, height and area were alternative, and bp and dye were optional. A DNA testing kit (such as the Identifiler PCR Amplification kit) was selected prior to clicking ‘select column (and kit)’.

Selecting loci and alleles

The mixsep default setting analyzed all loci and alleles. Specific loci and alleles were selected whenever necessary and the parameter setting interface was entered by clicking ‘continue’.

Parameter setting and mixed DNA analysis

These included ‘Number of contributors’, ‘Search for alternatives’, ‘Specify significance level’, and ‘Use fixed profile’. Mixed DNA analysis was started by clicking ‘Analyze mixture!’.

Parameters of analytical performance for mixsep
Rationale for use

The primary function of mixsep, which lacks a function module for ADO, is the separation of mixed DNA genotype combinations. Therefore, the simulated mixed DNA profiles of STR loci (n = 4566) were statistically analyzed excluding ADO.

Locus separation accuracy

Locus separation accuracy refers to an accurate separation of the genotype combination for a specific locus in a sample of mixed DNA profiles.

Horizontal analysis

The mixed DNA profile was used as a unit for statistical analysis of locus separation accuracy in order to compare the distribution patterns of the DNA profile data in association with different mixed gradients and mixed sample types.

Vertical analysis

The STR locus was used as a unit for the statistical analysis of locus separation accuracy in order to compare the distribution patterns of DNA profile data in association with the 16 STR loci used in the present study.

The separation efficiency of mixsep in male-mixed DNA profiles was assessed using statistical analysis in the horizontal and vertical dimensions.

Results

Preparation of simulated male-mixed DNA

The male DNA samples (n=40; nos. 1-40) and Promega male-DNA standards were classified according to the criterion of a DNA concentration difference of no greater than 0.5 ng/µl. The 22 single DNA samples that met this criterion were prepared into eleven groups of two-male mixed DNA samples. To include the Promega male-DNA standard in constructing simulated mixed DNA, the ten-fold-diluted 2800M control DNA working solution was further diluted twice, yielding a final concentration of 0.243 ng/µl (Table I). Each group of male-mixed DNA was prepared into nine mixed gradients, and the samples of each mixed gradient were amplified by PCR three times (thus, n=297). The mixed gradients of male-mixed DNA samples are shown in Table II.

Table I

DNA concentration in eleven groups of male-mixed DNA.

Table I

DNA concentration in eleven groups of male-mixed DNA.

Sample no.Sample 1
Sample 2
Difference (ng/µl)
Person no.Concentration (ng/µl)Person no.Concentration (ng/µl)
115.8035.750.05
2117.20407.210.01
3117.20147.220.02
4209.24269.350.11
5126.58196.680.10
6245.46275.480.02
746.92156.980.06
876.13296.220.09
9117.20387.140.06
10376.40396.390.01
1199480.2232800M0.2430.02

Table II

Mixed gradients of simulated male-mixed DNA.

Table II

Mixed gradients of simulated male-mixed DNA.

Male-mixed DNAMixed gradient
1:11:21:31:41:51:61:71:81:9
Volume sample 1 (µl)543222222
Volume sample 2 (µl)58981012141618
DNA quantity of male-mixed DNA

The simulated male-mixed DNA samples were checked by assessing selected samples using an ABI 7500 qPCR system (Applied Biosystems), including eleven groups of male-mixed DNA at a mixed gradient of 1:9. DNA quantification of each sample was repeated three times, and the mean values were taken as the DNA concentration (Table III).

Table III

DNA quantity in eleven groups of male-mixed DNA with mixed gradient of 1:9.

Table III

DNA quantity in eleven groups of male-mixed DNA with mixed gradient of 1:9.

Sample no.Quantity mixed DNA (ng/µl)Sample 1
Sample 2
Difference (ng/µl)
Person noConcentration (ng/µl)Person noConcentration (ng/µl)
18.3215.8035.752.57
210.94117.20407.213.74
310.51117.20147.223.31
413.38209.24269.354.14
59.60126.58196.683.02
67.01245.46275.481.55
79.6246.92156.982.70
88.6876.13296.222.55
910.38117.20387.143.24
109.31376.40396.392.92
110.33799480.2232800M0.2430.114

[i] The difference is the quantity of mixed DNA minus the concentration from either sample 1 or sample 2, depending on which value was smallest.

To fit the concentration range (0.5–1.25 ng/µl) of template DNA recommended by the kit used in this study, 99 male-mixed DNA working solutions (eleven groups of mixed DNA with nine mixed gradients in each group) were diluted appropriately. According to the DNA quantification results (Table III), 2-µl aliquots of each mixed DNA working solution were diluted by 10- or 15-fold with 9 or 14 volumes (18 or 28 µl) Ambion Nuclease-free Water. The 9948 and 2800M DNA standards with concentrations >0.5 ng/µl were not diluted. The volume of the DNA template was 2 µl for the mixed DNA sample, Sample 11, which was composed of the male-DNA standards (n=27), and 1 µl for the other groups, including single DNA samples used for constructing male-mixed DNA.

Scientific validation of simulated mixed DNA model

Mx assessment compares the estimated Mx value of the mixsep package (the alpha value) with the pre-set mixed gradient of simulated mixed DNA (the theoretical Mx value) for scientific validation of the established experimental model.

In the present study, the estimated Mx values of mixsep were used as the estimated alpha and the pre-set mixed gradients of male-mixed DNA were used as the theoretical alpha. The distribution of estimated and theoretical alpha values in Identifiler (ID)-STR profiles of the mixed DNA was examined by excluding STR loci with ADO.

In Fig. 3, the red line indicates y=x and the blue line represents the locally weighted regression curve. This approach had acceptable anti-noise performance and thus accurately reflected the correlation between estimated and theoretical alpha values. The results showed that with a theoretical alpha value ≤0.33 (that is, mixed gradients of 1:2 to 1:9), the estimated alpha of mixsep was greater than that of the theoretical value. However, with a gradient of 1:1, the estimated alpha value was smaller than that of the theoretical value. This observation may have been based on the assumption of normal distribution in constructing statistical models by mixsep, which led to conservative estimation of relatively extreme mixture proportions (such as 1:5, 1:6, 1:7, 1:8 and 1:9), inclining toward relatively balanced mixture proportions.

Two values showed an abnormal distribution in Fig. 3 and significantly deviated from the locally weighted regression curve. These two data corresponded to the third repetition of the gradient of 1:5 and the first repetition of the gradient of 1:6 for the mixed DNA samples of group no. 9, respectively. The two abnormal data were obtained when running mixsep with source code. However, when running mixsep from the software interface, the obtained alpha values were 0.1742 and 0.1537, respectively, which were each located near the weighted regression curve and followed a normal distribution. The reason for this result is elusive, since all other alpha values estimated using mixsep through source code were consistent with those estimated when using it through the software interface, and no bug was found when running mixsep through the software interface. In view of this situation, the results estimated by mixsep through the software interface are referred to in this article.

Root mean square error (RMSE) statistics showed that in ID-STR profiles, large RMSEs of estimated alpha values are scattered in eleven groups of male-mixed DNA samples, with relatively high frequencies in groups 8 and 9. In terms of mixed gradients, RMSEs were relatively large at a mixed gradient of 1:1 (>0.02) and ranged from 0.01 to 0.02 at the other gradients. Theoretically, mixed DNA at a gradient of 1:1 cannot be accurately separated (although this is ignored in statistical analysis). These results demonstrated that the RMSE between estimated and theoretical Mx was small (≤0.02) in ID-STR profiles of the male-mixed DNA model established in the present study. Thus, the obtained ID-STR profile data did allow scientific and rational analysis of mixed DNA.

Performance analysis of mixsep
Horizontal analysis

The eleven groups of male-mixed DNA profiles (with three parallel tests) at each mixed gradient involved 528 STR loci. Data statistics (Table VI) and distribution (Fig. 4) of locus separation accuracy and ADO number show that the ADO number increased from a gradient of 1:4 and peaked at gradients of 1:7, 1:8 and 1:9. The correlation coefficient of mixed gradient and locus separation accuracy was estimated at R2=−0.7121 (P=0.03139), indicating a negative linear correlation between these two parameters. The correlation coefficient of mixed gradient and ADO number was estimated at R2=−0.4244 (P=0.2549), demonstrating no significant correlation between these two parameters. Fig. 5 shows the distribution of average locus separation accuracy at different mixed gradients in the three parallel tests, in which the results were generally consistent. Locus separation accuracy was lowest at a mixed gradient of 1:1; with an increasing mixed gradient, the accuracy first increased and then decreased. Specifically, locus separation accuracy was relatively high at gradients of 1:2, 1:3 and 1:4 but decreased to low levels and fluctuated at gradients of 1:1 and 1:9. The accuracy was slightly higher in mixed DNA profiles excluding loci with ADO compared with those including ADO.

Table VI

Statistics of locus separation accuracy and ADO number in male-mixed DNA at different mixed gradients.

Table VI

Statistics of locus separation accuracy and ADO number in male-mixed DNA at different mixed gradients.

Mix1:11:21:31:41:51:61:71:81:9
Accuracy0.60610.88780.97910.91810.88090.83040.85100.81660.7557
Drop no02215169385945
Loci no.528526526513512519490469483
Sum528528528528528528528528528

[i] ADO, allelic drop-out.

Data statistics (Table VII) and distribution (Fig. 6) of locus separation accuracy in the eleven groups of male-mixed DNA samples at different mixed gradients show that the distribution pattern of the accuracy in every group of mixed DNA was generally consistent with the overall distribution mentioned above. The accuracy was lowest at a gradient of 1:1 (with the exception of no. 9). With an increasing mixed gradient, the accuracy first increased and then decreased. Among the eleven groups of mixed-DNA, large fluctuations in locus separation accuracy were observed in groups no. 7, 9 and 11, which may have been due to variations in experimental operations. The accuracy was generally high in groups no. 1, 3 and 4. There were differences in the overall level of locus separation accuracy among the eleven groups of mixed DNA, demonstrating the stochastic effect of sampling.

Table VII

Statistics of locus separation accuracy in eleven groups of male-mixed DNA samples at different mixed gradients.

Table VII

Statistics of locus separation accuracy in eleven groups of male-mixed DNA samples at different mixed gradients.

Mix Group
1234567891011
1:10.62500.60420.68750.62500.64580.45830.58330.75000.70830.50000.4792
1:20.93750.78720.93750.89580.95830.91670.81250.91490.93750.83330.8333
1:30.93750.97920.97920.97921.00000.97921.00000.95741.00001.00000.9583
1:40.89580.85420.93750.95830.87500.97920.97670.92500.89130.97920.8333
1:50.83330.79170.89580.95830.89580.93750.91890.93020.70830.93750.8958
1:60.85420.85420.87500.85420.87500.87500.83330.82980.55000.95830.7292
1:70.83330.79170.91670.87500.85110.79170.80000.92860.73910.95830.8913
1:80.86960.75000.91670.87230.78720.83330.80000.72970.78570.77080.8298
1:90.85110.58330.81250.77270.81250.83330.69440.76920.61900.85420.6875
Vertical analysis

Each group of male-mixed DNA profiles (with three parallel tests) involved 432 STR loci. Data statistics (Table VIII) and distribution (Fig. 7) of locus separation accuracy and the ADO number show that the accuracy was generally high (>80%) for the eleven groups of mixed DNA, with the exception of groups no. 2, 9 and 11. Due to low average peak heights of the active alleles (APH), the ADO number of STR loci was significantly greater in groups no. 7, 8 and 9 than it was in the other groups. In addition, there were large differences in locus separation accuracy, including and excluding loci with ADO (~10%).

Table VIII

Statistics of overall locus separation accuracy and ADO number in eleven groups of male-mixed DNA.

Table VIII

Statistics of overall locus separation accuracy and ADO number in eleven groups of male-mixed DNA.

Group1234567891011
Accuracy0.84850.77730.88430.86650.85580.84490.82610.86230.77610.86570.7925
Drop no.31052064693903
Loci no.429431432427430432.368363393432429
Sum432432432432432432432432432432432

[i] ADO, allelic drop-out.

In the mixed DNA experimental model, nine mixed gradients of a specific locus involved 33 values of locus separation accuracy. Data statistics (Table IX) and distribution (Fig. 8) of locus separation accuracy for 16 STR loci at each mixed gradient show that for a gradient of 1:1, the accuracy was ≤70% for the STR loci, with the exception of AMEL- and D3S1358 (outliers are shown in the lower area of the box-whisker plot, Fig. 8). For gradients of 1:2, 1:3, 1:4 and 1:5, the accuracy of each locus was relatively high, particularly at the gradient of 1:3 (≥90%), while at the gradients of 1:8 and 1:9, the accuracy underwent large fluctuations and declined to lower levels. According to the data distribution shown in the box-whisker plot, the average separation accuracy was lowest for the D7S820 locus among the 16 STR loci.

Table IX

Statistics of separation accuracy for 16 STR loci in male-mixed DNA profiles with different mixed gradients.

Table IX

Statistics of separation accuracy for 16 STR loci in male-mixed DNA profiles with different mixed gradients.

MixAMEL-CSF1POD13S317D16S539D18S51D19S433D21S11D2S1338D3S1358D5S818D7S820D8S1179FGATH01TPOXvWA
1:11.00000.66670.57580.42420.54550.54550.54550.57580.84850.57580.69700.45450.57580.57580.66670.4242
1:20.96970.78790.96970.87881.00000.75760.96970.81820.78790.96970.81820.96970.84380.90910.90910.8485
1:30.96971.00001.00001.00000.96970.90621.00000.93940.93941.00001.00000.96971.00001.00001.00000.9697
1:40.87880.93750.87880.96881.00000.76670.90910.93750.87880.93940.78120.96970.93550.96970.96770.9677
1:50.90910.84850.81820.87880.93100.90000.84850.87880.87880.87880.78120.93940.96670.81820.93330.9032
1:60.84850.90320.81820.90620.96880.76670.81820.81820.81250.87880.63640.93940.84850.78790.78120.7576
1:70.78790.90320.78120.83870.80770.82140.84380.78120.80650.96970.76671.00000.96300.90320.70970.9310
1:80.81820.77420.80000.96670.83330.66670.73330.75000.74190.93750.62960.87500.95830.87100.77780.9286
1:90.72730.70000.78120.90000.76920.81480.83870.64520.82760.84850.41940.84380.92310.63640.68970.7667

[i] STR, short tandem repeats.

Data statistics (Table X) and distribution (Fig. 9) of locus separation accuracy in the 297 simulated male-mixed DNA profiles show that the accuracy was relatively high for loci D5S818, D8S1179 and FGA (>88%), but relatively low for loci D19S433, D2S1338 and D7S820 (≤80%). The number of ADO was lowest in AMEL-, D5S818 and D8S1179, but was relatively high in loci D18S51, D19S433, FGA, TPOX and vWA (>15). The latter five loci were all distributed in the yellow and red channels with lower APH, consistent with the relatively low sensitivity to yellow and red fluorescence in the ABI 3130xl Genetic Analyzer. There was no significant correlation between the accuracy of the 16 STR loci and number of ADO, R2=−0.3095 (P=0.2434).

Table X

Overall separation accuracy and ADO number of 16 STR loci in male-mixed DNA profiles.

Table X

Overall separation accuracy and ADO number of 16 STR loci in male-mixed DNA profiles.

LocusAMEL-CSF1POD13S317D16S539D18S51D19S433D21S11D2S1338D3S1358D5S818D7S820D8S1179FGATH01TPOXvWA
Accuracy0.87880.83620.82530.86060.87220.77040.83510.79450.83680.88850.72890.88470.88430.82940.82800.8292
Drop no.010510312765911322941816
Loci no.297287292287266270291292288296284295268293279281
Sum no.297297297297297297297297297297297297297297297297

[i] ADO, allelic drop-out; STR, short tandem repeats.

Data statistics (Table XI) and distribution (Fig. 10) of locus separation accuracy in the eleven groups of simulated male-mixed DNA profiles show that groups no. 1, and 7 contained relatively large numbers of loci corresponding to the separation accuracy ≤0.5. The accuracy of loci D19S433, D2S1338 and D7S820 were associated with relatively large fluctuations, with the lowest median accuracy for D7S820. These results were generally consistent with the overall distribution of locus separation accuracy at the nine mixed gradients in the results from the other experiments.

Table XI

Statistics of separation accuracy for 16 STR loci in 11 groups of male-mixed DNA profiles.

Table XI

Statistics of separation accuracy for 16 STR loci in 11 groups of male-mixed DNA profiles.

GroupAMEL-CSF1POD13S317D16S539D18S51D19S433D21S11D2S1338D3S1358D5S818D7S820D8S1179FGATH01TPOXvWA
10.88891.00000.81480.96300.88000.23080.88890.96300.70370.92590.96300.77780.85190.74071.00000.9630
20.88890.74070.62960.66670.92310.88890.92590.40740.81480.77780.40740.85191.00000.74070.92590.8519
30.81480.96300.74071.00000.96300.96300.92591.00000.92590.92590.59260.85190.85190.88890.88890.8519
40.88890.92310.81480.92590.92310.85190.96300.66670.88890.96300.66670.9630.95831.00000.77780.7037
50.85190.74070.74071.00001.00000.92590.62960.88890.85190.88890.81481.00000.77780.88890.81480.8889
60.81480.85191.00000.88890.74070.55560.85190.88890.66670.96300.96300.88890.96300.70370.85190.9259
70.96300.51850.88460.85190.81250.84620.96000.66671.00000.77780.81820.92310.94120.87500.58330.8824
80.92591.00000.96150.95001.00000.66670.84620.95450.68181.00000.90480.76920.86670.73080.94120.6818
90.96300.82610.70830.70830.73911.00000.68000.77780.85190.69230.64000.92590.68000.88890.59090.7692
100.85190.74070.92590.88890.92591.00000.62960.92590.92590.88890.74070.85190.92590.92590.81480.8889
110.81480.92590.85190.62960.70370.66670.88460.62960.88890.96300.55560.92590.92000.74070.88890.7037

[i] STR, short tandem repeats.

Discussion

In the present study, an experimental model comprising eleven groups of male-mixed DNA (n=297) was established by following the mixed DNA construction principle of using an ABI 7500 real-time PCR system with scientific validation of the Mx parameter (RMSE≤0.02). The locus separation accuracy of the mixsep package was statistically analyzed using horizontal and vertical analysis of experimental data, with mixed DNA profiles and STR loci as units. The DNA profile distribution data corresponding to different mixed gradients, STR loci and mixed DNA types was examined to assess the performance of the mixsep package in the analysis of mixed DNA.

Locus separation accuracy of mixsep had a negative linear correlation with the Mx value (R2=−0.7121, with the exception of the gradient, 1:1, which first increased and then decreased with increasing mixed gradient imbalance. Thus, the Mx value was one of the primary factors that determined the success of mixed DNA analysis. Among the 16 STR loci, the number of ADO was relatively high in the D18S51, D19S433, FGA, TPOX and vWA loci (>15). These five loci were all located in the yellow and red channels and had a low APH, consistent with the low sensitivity to yellow and red fluorescence of the ABI 3130xl Genetic Analyzer. In addition, there was a large non-significant difference in locus separation accuracy obtained depending on whether the loci with ADO were included or excluded (~10%). The presence of ADO reduced the analytical performance of mixsep, consistent with the lack of ADO functional modules in this software.

The present study demonstrated that the mixsep software had a number of advantages. It was easy to operate and produced understandable results with a degree of controllability. It produced intuitive results presented in visual typing maps. Furthermore, rational assumptions were made in the established model with appropriate reasoning, and produced results with high validity. However, certain limitations remained in the use of mixsep, including the existence of bugs, which may result in the occasional generation of outliers in data analysis, as well as graphic dysfunction. In addition the control of software interface was inflexible and presentation was occasionally incomplete. Due to these limitations, the lack of analysis modules for dealing with stutter, drop-out and drop-in, and the unknown prior conditions in model assumptions, it is necessary to further optimize and improve the mixsep package in order to produce consistently reliable results.

Acknowledgments

This study was supported by grants from the National Natural Science Foundation of China (no. 81273348), the National Key Technology R&D Program of China (no. 2012BAK02B01) and the Hebei Provincial Science and Technology Program of China (no. 12275648D).

References

1 

R Development Core Team (2013): R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: URL: http://www.R-project.org. 2013

2 

RStudio (2012): RStudio: Integrated development environment for R (Version 0.96.122). Computer software. Boston, MA, USA: URL: http://www.rstudio.org/. 2012

3 

Gentleman RC, Carey VJ, Bates DM, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5:R802004. View Article : Google Scholar : PubMed/NCBI

4 

Smyth GK: Limma: linear models for microarray data. Bioinformatics and computational biology solutions using R and Bioconductor. Springer; Berlin, Germany: pp. 397–420. 2005

5 

Gautier L, Cope L, Bolstad BM and Irizarry RA: affy - analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 20:307–315. 2004. View Article : Google Scholar : PubMed/NCBI

6 

Ellis B, Haaland P, Hahne F and Le Meur N: basic structures for flow cytometry data. R package version 1.24.2. Computer software. 2009.

7 

Pages H, Carlson M, Falcon S, Li N and Maintainer MBP: Package ‘AnnotationDbi’: Annotation Database Interface. R package version 1.20.7. Computer software. 2014.

8 

Carlson M: hgu95av2.db: Affymetrix human genome U95 set annotation data (chip hgu95av2). R package version 2.8.0. Computer software. 2012

9 

Carlson M, Falcon S, Pages H, et al: A set of annotation maps describing the entire Gene Ontology. R package version 2.8.0. Computer software. 2010

10 

Shannon P: MotifDb: An annotated collection of protein-DNA binding sequence motifs. R package version 1.0.0. [Computer software], URL: http://bioconductor.org/biocLite.R. 2012

11 

Pages H, Aboyoun P, Gentleman R and DebRoy S: String objects representing biological sequences, and matching algorithms. R package version 2.26.3. [Computer software], URL: http://bioconductor.org/biocLite.R. 2009

12 

Tvedebrink T: mixsep: DNA mixture separation. R package version 0.2.1-2. Computer software. 2013

13 

Tvedebrink T: mixsep: An R-package for DNA mixture separation. Forensic Sci Int: Genet Supple Seri. 3:e486–e488. 2011.

14 

Tvedebrink T, Eriksen PS, Mogensen HS and Morling N: Identifying contributors of DNA mixtures by means of quantitative information of STR typing. J Comput Biol. 19:887–902. 2012. View Article : Google Scholar

15 

Tvedebrink T, Eriksen PS, Mogensen HS and Morling N: Evaluating the weight of evidence by using quantitative short tandem repeat data in DNA mixtures. J R Stat Soc Ser C Appl Stat. 59:855–874. 2010. View Article : Google Scholar

Related Articles

Journal Cover

August-2015
Volume 12 Issue 2

Print ISSN: 1791-2997
Online ISSN:1791-3004

Sign up for eToc alerts

Recommend to Library

Copy and paste a formatted citation
x
Spandidos Publications style
Hu N, Cong B, Gao T, Chen Y, Shen J, Li S and Ma C: Application of mixsep software package: Performance verification of male-mixed DNA analysis. Mol Med Rep 12: 2431-2442, 2015
APA
Hu, N., Cong, B., Gao, T., Chen, Y., Shen, J., Li, S., & Ma, C. (2015). Application of mixsep software package: Performance verification of male-mixed DNA analysis. Molecular Medicine Reports, 12, 2431-2442. https://doi.org/10.3892/mmr.2015.3710
MLA
Hu, N., Cong, B., Gao, T., Chen, Y., Shen, J., Li, S., Ma, C."Application of mixsep software package: Performance verification of male-mixed DNA analysis". Molecular Medicine Reports 12.2 (2015): 2431-2442.
Chicago
Hu, N., Cong, B., Gao, T., Chen, Y., Shen, J., Li, S., Ma, C."Application of mixsep software package: Performance verification of male-mixed DNA analysis". Molecular Medicine Reports 12, no. 2 (2015): 2431-2442. https://doi.org/10.3892/mmr.2015.3710