Development and validation of an ultra-high sensitive next-generation sequencing assay for molecular diagnosis of clinical oncology

  • Authors:
    • Jiao Liang
    • Yaoguang She
    • Jiaqi Zhu
    • Longgang Wei
    • Lanying Zhang
    • Lianju Gao
    • Yan Wang
    • Jing Xing
    • Yang Guo
    • Xuehong Meng
    • Peiyu Li
  • View Affiliations

  • Published online on: September 26, 2016     https://doi.org/10.3892/ijo.2016.3707
  • Pages: 2088-2104
Metrics: Total Views: 0 (Spandidos Publications: | PMC Statistics: )
Total PDF Downloads: 0 (Spandidos Publications: | PMC Statistics: )


Abstract

Dramatic improvements in the understanding of oncogenes have spurred the development of molecular target therapies, which created an exigent need for comprehensive and rapid clinical genotyping. Next-generation sequencing (NGS) assay with increased performance and decreased cost is becoming more widely used in clinical diagnosis. However, the optimization and validation of NGS assay remain a challenge, especially for the detection of somatic variants at low mutant allele fraction (MAF). In the present study, we developed and validated the Novogene Comprehensive Panel (NCP) based on targeted capture for NGS analysis. Due to the high correlation between SNV/INDEL detection performance and target coverage, here we focused on these two types of variants for our deep sequencing strategy. To validate the capability of NCP in single-nucleotide variant (SNV) and small insert and deletion (INDEL) detection, we implemented a practical validation strategy with pooled cell lines, deep sequencing of pooled samples (>2000X average unique coverage across target region) achieving >99% sensitivity and high specificity (positive predictive value, PPV >99%) for all types of variations with expected MAF >5%. Furthermore, given the high sensitivity and that false positive may exist in this assay, we confirmed its accuracy of variants with MAF <5% using 35 formalin-fixed and paraffin-embedded (FFPE) tumor specimens by Quantstudio 3D Digital PCR (dPCR; Life Technologies) and obtained a high consistency (32 of 35 mutations detected by NGS were verified). We also used the amplification refractory mutation system (ARMS) to verify the variants with a MAF in a broad range of 2-63% detected in 33 FFPE samples and reached a 100% PPV for this assay. As a potential clinical diagnosis tool, NCP can robustly and comprehensively analyze clinical-related genes with high sensitivity and low cost.

Introduction

Cancer is a genomic disease harboring a cocktail of mutated genes. Personalized medicine approaches based on molecular studies and cytogenetic analysis can treat with therapies directly on mutated cancer driving genes (14). For example, crizotinib (PF-02341066), a small-molecular inhibitor of the anaplastic lymphoma kinase (ALK), and kinase inhibitor vemurafenib (PLX4032) against BRAF (57), both have dramatic effects on most patients with corresponding driver mutations. In fact, hundreds of frequent somatic mutations, which involved in multiple cellular pathways, have been identified in different types of cancer during the past decades (8), and more comprehensive diagnostic approaches are needed to identify the individual driver mutations which have important impact on tumor progression in different cancer patients (9) and thus, could serve as therapeutic targets in clinical treatment. To assess the status of these biomarkers, several approaches have been implemented in clinical diagnosis, such as fluorescence in situ hybridization (FISH), immunohistochemistry (IHC) and Sanger methodology (1013). However, due to the high cost and technical limitations, it is unaffordable to do the multiplexed assessment of driving somatic alterations.

NGS has already been used to identify hundreds of driving mutations and analyze tens of thousands of tumor samples in a high-throughput with increased performance and decreased costs (1416), which makes it possible to serve as a clinical testing approach. In reality, commercial NGS-based assays have already been developed and validated to provide comprehensive genomic test in clinic (1720). These assays usually have a good performance when detecting variants with high mutant allele frequencies (MAF >10%). However, variants with low MAF usually appear in tumor tissues for many reasons, including contaminating normal cells and intra-tumor heterogeneity (21,22). Therefore, it is critical to develop a robust clinical assay that can detect low allele frequency mutations. Here we developed an ultra-high sensitive NGS-based assay, which interrogates all 7011 exons of 483 cancer-related genes and 94 introns of 18 genes with re-arrangement. Using the Illumina HiSeq X platform, hybridization-based capture of target regions reached a high-coverage (>2000X) with acceptable cost. With in-house data analysis approaches, we could identify low MAF (0.5%) variants from sequencing error accurately. We used pools of mixed cell lines with known alterations to perform analytical validation, and 35 FFPE tissue samples to confirm the specificity of low MAF variants detection performance in clinic by dPCR (23). In addition, ARMS-PCR (24) was used to confirm the overall specificity of our assay.

Materials and methods

NCP NGS design

Novo assay was developed to characterize SNV/INDEL, CNV and gene fusion in 483 cancer-related genes. These genes were selected based on My Cancer Genome database (https://www.mycancergenome.org), Catalogue of Somatic Mutations in Cancer (COSMIC) and other sources (18,25). Briefly, genes containing clinically important variants and genes have been reported as cancer-related were included based on a record of reimbursement in sequencing. All exons of these genes were considered which underwent hybridization-based capture from 483 cancer-related genes (Table I). For structural rearrangements detection, introns spanning recurrent fusion breakpoints were also included. Agilent's proprietary algorithm and synthetic process was used to generate the baits. The hybrid selection was done using a pool of 120-mer RNA-based baits (Agilent SureSelect) with overlap excess 3-fold for target region. All 47660 hybrid baits for catching target region constitute 2.3 Mb genomic positions, including 7011 exons and 94 introns.

Table I

Genes and transcripts ID targeted in hybridization capture.

Table I

Genes and transcripts ID targeted in hybridization capture.

Gene symbolTranscripts IDGene symbolTranscripts IDGene symbolTranscripts ID
ABCB1NM_000927ETV6NM_001987NUP93NM_001242796
ABCC1NM_004996EWSR1NM_001163287PAK1NM_001128620
ABCC2NM_000392EZH2NM_001203248PAK3NM_001128173
ABCC4NM_001105515FAM46CNM_017709PALB2NM_024675
ABCC6NM_001079528FANCANM_001018112PARP1NM_001618
ABCG2NM_004827FANCCNM_001243744PARP2NM_001042618
ABL1NM_005157FANCD2NM_033084PAX5NM_001280551
ACVR1BNM_020327FANCENM_021922PBRM1NM_018313
AKT1NM_005163FANCFNM_022725PDCD1NM_005018
AKT2NM_001243027FANCGNM_004629PDGFRANM_006206
AKT3NM_005465FANCLNM_001114636PDGFRBNM_002609
ALKNM_004304FBXW7NM_001257069PDK1NM_002610
AMER1NM_152424FCGR3ANM_001127595PHF6NM_032335
APCNM_000038FGF10NM_004465PHKA2NM_000292
ARNM_001011645FGF14NM_004115PIGFNM_002643
ARAFNM_001256197FGF19NM_005117PIK3CANM_006218
ARFRP1NM_001267546FGF23NM_020638PIK3CBNM_001256045
ARID1ANM_139135FGF3NM_005247PIK3CGNM_002649
ARID1BNM_020732FGF4NM_002007PIK3R1NM_001242466
ARID2NM_152641FGF6NM_020996PIK3R2NM_005027
ASXL1NM_001164603FGFR1NM_001174064PLK1NM_005030
ATICNM_004044FGFR2NM_001144919PPARDNM_177435
ATMNM_000051FGFR3NM_000142PPP1R13LNM_001142502
ATP7ANM_000052FGFR4NM_022963PPP2R1ANM_014225
ATRNM_001184FGRNM_001042729PRDM1NM_182907
ATRXNM_000489FKBP1ANM_054014PRDX4NM_006406
AURKANM_198435FLT1NM_001160031PRKAA1NM_206907
AURKBNM_001256834FLT3NM_004119PRKAR1ANM_002734
AXIN1NM_003502FLT4NM_002020PRKCANM_002737
AXLNM_001278599FOXL2NM_023067PRKCBNM_002738
B2MNM_004048FRKNM_002031PRKCENM_005400
BAIAP3NM_001199096FUBP1NM_003902PRKCGNM_002739
BAP1NM_004656FYNNM_153048PRKDCNM_006904
BARD1NM_000465FZD7NM_003507PRRT2NM_001256443
BCL2NM_000657GALNT14NM_001253827PTCH1NM_001083607
BCL2L2NM_001199839GATA1NM_002049PTENNM_000314
BCL6NM_001706GATA2NM_001145662PTK2NM_001199649
BCORNM_017745GATA3NM_002051PTK6NM_001256358
BCORL1NM_021946GCKNM_033508PTPN11NM_080601
BCRNM_004327GID4NM_024052PTPRDNM_130391
BIRC5NM_001168GINS2NM_016095RAC2NM_002872
BLKNM_001715GNA11NM_002067RAD50NM_005732
BLMNM_000057GNA13NM_001282425RAD51NM_001164270
BRAFNM_004333GNAQNM_002072RAF1NM_002880
BRCA1NM_007297GNASNM_016592RARANM_001024809
BRCA2NM_000059GPC3NM_001164619RB1NM_000321
BRIP1NM_032043GPR124NM_032777RETNM_020630
BSGNM_001728GRIN2ANM_001134408RICTORNM_001285440
BTKNM_000061GSK3BNM_001146156RMDN2NM_001170793
C11orf30NM_020193GSTM1NM_000561RNF43NM_017763
C18orf56NM_001012716GSTM3NM_000849ROCK1NM_005406
C8orf34NM_001195639GSTP1NM_000852ROS1NM_002944
CAMK2GNM_001204492GSTT1NM_000853RPL13NM_033251
CAMKK2NM_172215H3F3ANM_002107RPS6KA1NM_001006665
CARD11NM_032415HCKNM_001172132RPS6KB1NM_001272044
CASP8NM_033356HGFNM_001010934RPTORNM_001163034
CBFBNM_001755HIF1ANNM_017902RRM1NM_001033
CBLNM_005188HIST1H3BNM_003537RUNX1NM_001122607
CBR1NM_001757HNF1ANM_000545SDHANM_004168
CBR3NM_001236HRASNM_005343SDHAF1NM_001042631
CCND1NM_053056HSP90AA1NM_005348SDHAF2NM_017841
CCND2NM_001759IDH1NM_005896SDHBNM_003000
CCND3NM_001136126IDH2NM_002168SDHCNM_003001
CCNE1NM_001238IGF1NM_001111285SDHDNM_001276506
CCR4NM_005508IGF1RNM_000875SETD2NM_014159
CD19NM_001770IGF2NM_000612SF3B1NM_001005526
CD22NM_001185100IGF2RNM_000876SGK1NM_005627
CD274NM_001267706IKBKBNM_001556SHHNM_000193
CD33NM_001177608IKBKENM_001193322SIK1NM_173354
CD38NM_001775IKZF1NM_001220768SKP2NM_005983
CD3EAPNM_012099IL7RNM_002185SLC10A2NM_000452
CD52NM_001803INHBANM_002192SLC15A2NM_001145998
CD74NM_004355INSRNM_001079817SLC22A1NM_153187
CD79ANM_001783IRF4NM_001195286SLC22A16NM_033125
CD79BNM_000626IRS2NM_003749SLC22A2NM_003058
CDANM_001785ITKNM_005546SLC22A6NM_153277
CDC73NM_024529JAK1NM_002227SLCO1B1NM_006446
CDH1NM_004360JAK2NM_004972SLCO1B3NM_019844
CDK1NM_001170407JAK3NM_000215SMAD2NM_001135937
CDK12NM_016507JUNNM_002228SMAD4NM_005359
CDK2NM_001798KAT6ANM_001099413SMARCA4NM_001128845
CDK4NM_000075KDM5ANM_001042603SMARCB1NM_003073
CDK5NM_001164410KDM5CNM_001146702SMONM_005631
CDK6NM_001259KDM6ANM_021140SOCS1NM_003745
CDK7NM_001799KDRNM_002253SOD2NM_000636
CDK8NM_001260KEAP1NM_012289SOX10NM_006941
CDK9NM_001261KITNM_000222SOX2NM_003106
CDKN1BNM_004064KITLGNM_003994SOX9NM_000346
CDKN2ANM_001195132KLC3NM_177417SPENNM_015001
CDKN2BNM_078487KLHL6NM_130446SPG7NM_199367
CDKN2CNM_078626KMT2ANM_001197104SPOPNM_003563
CEBPANM_001285829KMT2BNM_014727SRCNM_198291
CHEK1NM_001274KMT2CNM_170606SRD5A2NM_000348
CHEK2NM_001257387KMT2DNM_003482SRMSNM_080823
CHST3NM_004273KRASNM_033360STAG2NM_006603
CICNM_015125LCKNM_001042771STAT1NM_139266
COMTNM_007310LIMK1NM_001204426STAT2NM_005419
CREBBPNM_004380LMO1NM_002315STAT3NM_003150
CRKLNM_005207LRP1BNM_018557STAT4NM_003151
CRLF2NM_022148LRP2NM_004525STAT5ANM_003152
CSF1RNM_005211LYNNM_002350STAT5BNM_012448
CSKNM_001127190MAP2K1NM_002755STAT6NM_001178080
CSNK1A1NM_001271742MAP2K2NM_030662STEAP1NM_012449
CTCFNM_001191022MAP2K4NM_003010STK11NM_000455
CTLA4NM_001037631MAP3K1NM_005921STK3NM_006281
CTNNA1NM_001903MAP4K4NM_145687STK4NM_006282
CTNNB1NM_001904MAP4K5NM_198794SUFUNM_001178133
CYBANM_000101MAPK1NM_138957SULT1A1NM_177534
CYLDNM_001042412MAPK10NM_138981SULT1A2NM_001054
CYP19A1NM_000103MAPK14NM_139013SULT1C4NM_006588
CYP1A1NM_000499MAPK8NM_002750SYKNM_001174167
CYP1A2NM_000761MAPK9NM_001135044TCF7L1NM_031283
CYP1B1NM_000104MAPKAPK2NM_004759TCF7L2NM_001198525
CYP2A6NM_000762MARK1NM_001286129TEKNM_000459
CYP2B6NM_000767MCL1NM_001197320TET2NM_017628
CYP2C19NM_000769MDM2NM_001278462TGFBR1NM_004612
CYP2C8NM_001198853MDM4NM_001278516TGFBR2NM_003242
CYP2C9NM_000771MED12NM_005120TK1NM_003258
CYP2D6NM_001025161MEF2BNM_001145785TMPRSS2NM_005656
CYP2E1NM_000773MEN1NM_130803TNFNM_000594
CYP3A4NM_001202855MERTKNM_006343TNFAIP3NM_006290
CYP3A5NM_001190484METNM_001127500TNFRSF10ANM_003844
CYP4B1NM_000779MITFNM_001184968TNFRSF10BNM_003842
DAXXNM_001254717MKNK2NM_199054TNFRSF14NM_003820
DDR1NM_001202523MLH1NM_001167617TNFRSF8NM_001243
DDR2NM_001014796MPLNM_005373TNFSF11NM_003701
DNMT1NM_001130823MRE11ANM_005590TNFSF13BNM_001145645
DNMT3ANM_153759MS4A1NM_152866TNK2NM_005781
DOT1LNM_032482MSH2NM_000251TOP1NM_003286
DPYDNM_001160301MSH6NM_001281494TP53NM_001276698
DSCAMNM_001389MST1RNM_001244937TPMTNM_000367
E2F1NM_005225MTDHNM_178812TPX2NM_012112
EGFNM_001178131MTHFRNM_005957TSC1NM_001162426
EGFL7NM_201446MTORNM_004958TSC2NM_000548
EGFRNM_201283MTRRNM_002454TSHRNM_001018036
EGR1NM_001964MUTYHNM_001048174TYMSNM_001071
EMC8NM_001142288MYCNM_002467TYRO3NM_006293
EML4NM_019063MYCLNM_005376U2AF1NM_001025204
ENOSF1NM_001126123MYCNNM_005378UBE2INM_194259
EP300NM_001429MYD88NM_001172566UGT1A1NM_000463
EPHA1NM_005232NAT1NM_001160174UGT1A9NM_021027
EPHA2NM_004431NAT2NM_000015UGT2B15NM_001076
EPHA3NM_182644NCAM1NM_001076682UGT2B17NM_001077
EPHA4NM_004438NCF4NM_013416UGT2B7NM_001074
EPHA5NM_001281767NCOA3NM_001174088UMPSNM_000373
EPHA7NM_004440NCOR1NM_001190438VEGFANM_001171627
EPHA8NM_001006943NEK11NM_145910VEGFBNM_003377
EPHB1NM_004441NF1NM_001128147VHLNM_000551
EPHB2NM_004442NF2NM_181830WEE1NM_001143976
EPHB3NM_004443NFE2L2NM_001145413WISP3NM_198239
EPHX1NM_000120NFKBIANM_020529WNK3NM_020922
ERBB2NM_004448NKX2-1NM_003317WT1NM_001198552
ERBB3NM_001005915NOS3NM_001160111XPCNM_001145769
ERBB4NM_005235NOTCH1NM_017617XPO1NM_003400
ERCC1NM_202001NOTCH2NM_001200001XRCC1NM_006297
ERCC2NM_001130867NPM1NM_001037738XRCC4NM_022406
ERGNM_001136155NQO1NM_000903YES1NM_005433
ESR1NM_000125NRASNM_002524ZAP70NM_207519
ETV1NM_001163151NTRK1NM_002529ZC3HAV1NM_024625
ETV4NM_001261439NTRK2NM_001007097ZNF217NM_006526
ETV5NM_004454NTRK3NM_001007156ZNF703NM_025069

Genes targeted for rearrangement detection

Gene symbolTranscripts IDGene symbolTranscripts IDGene symbolTranscripts ID

ALKNM_004304ETV6NM_001987MYCNM_002467
BCRNM_004327EWSR1NM_001163287NTRK1NM_002529
BRAFNM_004333KMT2ANM_001197104PDGFRANM_006206
EGFRNM_201283RAF1NM_002880ROS1NM_002944
ETV1NM_001163151RARANM_001024809CRLF2NM_022148
ETV4NM_001261439RETNM_020630
ETV5NM_004454TMPRSS2NM_005656

[i] The genes and transcripts by the Novogene Comprehensive Panel. This assay covers all exons and introns spanning recurrent fusion breakpoints in v64 of the COSMIC database.

Clinical specimens

Tumor specimens were collected from non-small cell lung cancer (NSCLC) and breast cancer patients at Chinese PLA General Hospital with informed consent according to the internal Review and rules of Ethics. In the very beginning of this assay, clinical samples should match several standards as follows to ensure downstream analysis. At least 10 slices of 5 μm FFPE sections or tissues with a volume of >1 was required. For each sample, hematoxylin-eosin stained slides (Fig. 1) were prepared and reviewed by a pathologist to estimate tumor purity. All samples with <50% tumor purity were marked for tumor enrichment by microdis-section to minimize contamination from normal cells (Fig. 2).

Cell line sample collection

Normal cell lines harboring the population distribution of known germ line variants were mixed, and multiplexed pools with low MAF variants were used to assess and validate the limit of variant detection. First of all, to get the variants set for assessment, we sequenced 5 cell lines from the 1000 Genomes Project (26) individually and got the SNP and INDEL sites from dbSNP database (build 146) consistent with a homozygous (MAF >90%) or heterozygous (40%<MAF<60%). To estimate the INDEL detection performance, 3 additional cell lines from COSMIC database (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/) which also were sequenced individually to get the original MAF of cancer-related somatic variants in each sample. All 8 cell lines were mixed together in designed proportions, and the expected MAF of each variant was calculated on the mixed ratios (Table II). Eventually, we achieved the 2625 variants spanning a range of expected MAF (0.5–20%) and INDEL lengths (1–40 base pair, bp) as gold-standard (Table III). Cell lines obtained from Coriell Institute (http://ccr.coriell.org/) and ATCC (http://www.atcc.org/) were routinely cultured in Dulbecco's modified Eagle's media (DMEM) with 10% heat-inactivated fetal bovine serum (FBS; Invitrogen, Waltham, MA, USA) in a 75-cm2 cell culture flask. The cells were seeded into cell culture flasks at a concentration of 1×105 viable cells/ml and incubated at 37°C in a humidified atmosphere containing 5% CO2.

Table II

Mix ratio for cell lines.

Table II

Mix ratio for cell lines.

Cell lineVolumeRatio
GM191140.041
GM191080.082
RL95-20.082
LOVO0.164
GM185110.164
HCT-150.328
GM184880.6416
GM189576.52163
Total8200

[i] In order to get more gold-standard variants with mutant allele frequencies from 0.5 to 20%, cell lines were mixed in designed proportions.

Table III

Distribution of expected mutant allele frequencies in SNV and INDEL test set.

Table III

Distribution of expected mutant allele frequencies in SNV and INDEL test set.

Expected mutant allele frequencyNo. of sites (SNV)No. of sites (INDEL)
<0.5%56832
0.5–1%44631
1–2%22429
2–3%8110
3–4%39031
4–5%27817
5–10%39319
>10%733
Total2453172

[i] Mixed cell lines contained gold-standard variants with mutant allele frequencies ranging from 0.5 to 20%. These variants were used to calculate the detection performance of our assay.

Library preparation and sequencing

Generally, genome DNA extracted was performed using DNeasy Blood & Tissue kit (Qiagen, Hilden, Germany). For FFPE sample special, DNA was isolated using the GeneRead DNA FFPE kit (Qiagen, Valencia, CA, USA) following the protocol. Besides the purification of high yields of DNA from FFPE tissue sections, this kit could remove deaminated cytosine to prevent false results in sequencing (27). The ratio of absorbance at 260 and 280 nm is used to assess the purity of extracted DNA, and we used the Qubit® Quantitation Platform to quantitated DNA. A Covaris S220 focused-ultrasonicator (Covaris, Woburn, MA, USA) was used to fragment genomic DNA (500 ng) and an Agilent Bioanalyzer 2100 (Agilent Technologies) to ensure an average fragment size of 200 to 400 base pair (bp). The library preparation after fragmentation were done using instruction manual of KAPA Hyper Prep kit. The protocol included: i) repairing the DNA ends; ii) adding ‘A’ base to the DNA fragments; iii) ligating the paired-end adaptor; iv) purifying the sample using AMPure XP beads; and v) amplifying the adaptor-ligated library and purifying the sample using AMPure XP beads. Prepared library was hybridized using NCP custom designed baits as described in SureSelectQXT (Agilent Technologies) and the product was then amplified for 14 PCR cycles. The size range of the prepared library was assessed using Agilent 2100 Bioanalyzer and qualified using ABI StepOnePlus. The concentration of each library was quantified using qPCR NGS Library Quantification kit and Protocol was used to calculate the final pooling volume to sequencing. The products were sequenced using the Illumina HiSeq X platform with paired-end sequencing runs (2×150) under Illumnina recommended protocols.

Data analysis

Clean data were generated by data processing steps including base calling, demultiplexing and adapter trimming. All these steps were performed using Illumina HiSeq X vendor software on default parameters. We further performed our in-house software for clean data quality control (QC) which included: i) removing read pairs if any one of the two reads containing base ‘N’ >10%; ii) removing read pairs if any one of the two reads containing base with quality below Q10 >50%; iii) trimming the 3′ end of the read from the first base below Q20; and iv) removing reads shorter than 100 bp. Clean data after QC were mapped to the human reference genome (GRCh37) using BWA aligner v0.7.8 (28) with the default parameters. PCR duplicate read removal was done using Picard 1.119 (http://picard.sourceforge.net/index.html). According to the result, a sequence metric collection was generated including the number of total reads, percentage of reads mapped, on target reads number, average target coverage and percentage of target region with >200X and 1000X coverage. Before SNV and INDEL calling, local realignment was performed using Genome Analysis Toolkit (GATK version 2.7–2-g6bda569) (29,30) with default parameters and recommended ‘known sites’ in GATK best practice (https://software.broadinstitute.org/gatk/best-practices/). For SNV detection, we denote the reference allele and the coverage of each site as r and d and denote the error rate corresponding to the base calling at read i (i = 1…d) as ei. We used a null model to explain the data in which there is no SNV at that site and all non-reference alleles to be sequencing error. The number of variant bases (k) with ei <1e−3 (associated Phred-like quality score qi>30) in each site was then given a binomial distribution. The probability under this null model was given by the following formula:

P(X≥k∣d)=1-∑i=0k-1P(X=i∣d)

where P(X = i|d) was the probability of observing i variants in the d reads of the site. Assuming the sequencing errors were independent across reads and occurred with probability e0 (e0 = 1e−3/3) to each non-reference allele. We could obtain

P(X=i∣d)=(dk)e0k(1-e0)d-k

The P-value was then given by P(X≥k|d) and the cut-off (P-value <1e−6) was established to eliminate random sequencing error. For INDEL detection, we simply kept variants supporting reads >10. We also employed several filters to reduce systematic errors. Empirical filters including strand bias (Fisher's exact test, P<1e−6), site median base quality (MBQ >30), site median mapping quality (MMQ >30), variant MAF (MAF >0.5%). Variants pass filters were annotated by dbSNP b146, My Cancer Genome database (https://www.mycancergenome.org) and Oncomine database v1.4.1 to get the clinical relevant information. However, cross library contamination may occur and a report would not be generated once the sample contained >10 variants with low-MAF (MAF ≤10%) in dbSNP. In the report stage, all annotated variants with MAF ≥5% would be reported and other cancer-related variants would be validated by 3dPCR. The whole workflow for the data analysis is outlined in Fig. 3. The parameters and descriptions used are listed in Table IV.

Table IV

Description of filters in data analysis.

Table IV

Description of filters in data analysis.

Data analysisDescription and thresholds
Quality controlRemove read pairs with low quality, which may lead to false positive in downstream process. Four tests are used to identify such read pairs: i) read pair with one of the two reads containing base ‘N’ >10%; ii) read pairs with any one of the two reads containing base with quality below Q10 >50%; iii) trimming the 3′ end of the read from the first base below Q20; and iv) removing reads <100 bp.
MappingReads are mapped to human reference using BWA aligner v0.7.8 with BWA-MEM algorithm and relevant default parameters.
RealignmentThe GATK realignment is used to correct the misalignment due to the presence of an INDEL. This step use two files ‘Mills_and_1000G_gold_standard.indels.b37.sites.vcf’ and ‘1000G_phase1.indels.b37.vcf’ (https://software.broadinstitute.org/gatk/best-practices/) to get these INDEL. The default parameters are used to perform the realignment.
Call SNVA binomial test is used to separate true positive from noises. The P-value cut-off is 1e−6, and the probability of sequencing error is 1e−3/3.
Call INDELA cut-off of 10 support reads is used to call INDEL.
Hard filterTo further remove false positives, several hard filters have been used as follows: i) Fisher's exact test for strand bias, P-value <1e−6. Some false positives are generated in sequencing step and have close relationship to the front of the sequence (homopolymer or other special sequence); ii) site median base quality >30. In case of the base quality of each read could not represent the true error rate, the median base quality of each site is used to evaluate such error rate; iii) site median mapping quality >30. This filter is used to avoid the misalignment of repeat sequences with small difference in human reference which are easily mistaken as SNV.

[i] These filters were obtained from clinical samples and covered all special cases that we had met before. Therefore, it could identify true positive variants from most noise in sequencing.

Compared with other software

To measure the effect of our approach, we compared the pooled cell-line result with GATK, a widely used software. We followed the ‘GATK best practice’, the ‘IndelRealigner’ parameter ‘LOD_Threshold_For_Cleaning’ was 0.3, the ‘BaseRecalibrator’ was with default parameters, the SNV/INDEL calling type was ‘HaplotypeCaller’ with parameters ‘standard_min_confidence_threshold_for_emitting’ as 10 and ‘standard_min_confidence_threshold_for_calling’ as 30.

Performance statistics calculation

For sensitivity estimation, variants detected in pools would be assigned as true positive (TP), or false negative (FN) if not detected. Sensitivity was calculated as TP/(TP+FN). For specificity estimation, the pool variants also detected in the pure sample were assigned as true positive (TP), or false positive (FP) if none was detected. PPV was calculated as TP/(TP+FP).

Mutation detection by dPCR

dPCR is a method used in absolute quantification analysis of clonally amplified nucleic acids (including DNA, cDNA, methylated DNA or RNA). With dPCR, a sample is partitioned so that individual nucleic acid molecules within the sample are localized and concentrated within many separate regions. After PCR amplification, nucleic acids may be quantified by counting the regions that contain PCR end-product, positive reactions. Here, we used the QuantStudio™ 3D Digital PCR System platform (Life Technologies) regarding SNP mutation quantitation. For dPCR, the first step is preparing and loading samples onto QuantStudio™ 3D Digital PCR 20K chips. Mutations were analysed by TaqMan® SNP Genotyping Assays (Life Technologies), which containing TaqMan®-MGB probes and primers. We prepared 15 μl reaction mixes according to the manufacturer's instructions, and loaded 14.5 μl onto each chip. The Mix contains ROX® dye, which served as a passive reference. After chips were loaded, we run the Digital PCR 20K Chips with a ProFlex™ 2x Flat PCR System under the following conditions: 96°C for 10 min, 39 cycles at 56°C for 2 min and at 98°C for 30 sec, followed by a final extension step at 56°C for 2 min. After thermo-cycling, we analyzed the prepared chips using dPCR instrument.

Mutation detection by ARMS-PCR

ARMS-PCR is a real-time PCR-based test which covers the 29 EGFR hotspots from exon 18–21. The assay was performed according to the manufacturer's protocol for the ADx EGFR29 Mutation kit (Amoy Diagnostics, Co., Ltd., Xiamen, China) with the MX3000P (Stratagene, La Jolla, CA, USA) real-time PCR system. Template DNA (0.4 μl), 3.6 μl deionized water and 16 μl other reaction components was used in the RT-PCR reaction system. PCR was performed with initial denaturation at 95°C for 10 min, followed by 40 cycles of amplification (at 95°C for 30 sec and 61°C for 1 min). The results were analyzed according to the criteria defined by the manufacturer's instructions. Positive results were defined as [Ct(sample) − Ct(control)] < Ct(cut-off).

Results

Overview

NCP is a NGS-based clinical test for detection of somatic cancer related mutations. DNA was extracted from tumor tissues and FFPE samples, 500 ng of which was fragmented, captured using custom-designed hybridization-based biotinylated cRNA reagents and amplified via limited-cycle PCR to enrich 7,011 exons and 94 introns of 483 cancer related genes (totaling ~2.3 million sites). We used clinical samples to generate the bioinformatics pipeline for data analysis (Table IV) and cell lines to validate the whole work flow. For the 8 single cell lines, using the Illumina HiSeq X platform, achieving an average of 13,330 Mb (SD=3,995 Mb) total bases with 38.09% on-target (SD=4.78%), target regions were sequenced to 2148X (SD=537X) median coverage across targeted bases, with 99.05% (SD=0.28%) of targeted bases covered by at least 200 reads (Table V). The 2453 SNV and 172 INDEL detected in single cell line consistent with database would be used for assessment of SNV/INDEL detection. Pools of mixed cell lines were used to get the relationship between median coverage and performance, which achieved total bases of 4,762, 10,896 and 16,351 Mb, the median coverage of 1,029X, 2,237X and 3,194X (Table VI). Due to the high sensitivity NGS benefit from high coverage, the hotspot mutations with MAF <5% detected by this assay in 35 FFPE samples were confirmed by dPCR. All samples used in this test are summarized in Table VII. Finally, 33 hotspot mutations detected by NGS in FFPE samples with a MAF from 2 to 63% in NGS were tested by ARMS-PCR.

Table V

Summary of sequencing metrics for cell lines.

Table V

Summary of sequencing metrics for cell lines.

Cell lineTotal read pairs (M)Total bases (Mb)Mapped baseNum (Mb)BaseNum on target (Mb)Covered at least 200X (%)Median target coverage (X)
GM1851115122,59513,5677,92099.603405.41
GM189578412,6338,8755,21599.302242.34
GM19114619,1307,2164,43899.101908.01
GM191087310,9237,7454,00498.801721.50
GM184888212,2958,8104,47298.901922.71
RL95-28312,4058,8934,16198.801788.99
HCT-158813,2179,2544,81199.002068.53
LoVo9013,4449,4534,95098.902128.07

[i] Pure cell lines used to establish the SNV and INDEL test set.

Table VI

Summary of sequencing metrics for mixed cell lines pool.

Table VI

Summary of sequencing metrics for mixed cell lines pool.

Pool nameTotal read pairs (M)Total bases (Mb)Mapped baseNum (Mb)BaseNum on target (Mb)Covered at least 200X (%)Median target coverage (X)
5G324,7624,5912,39397.501028.96
10G7310,89610,3165,20299.202236.63
20G10916,35115,1387,42999.503194.21

[i] Cell line pools were used to calculate variants detection performance.

Table VII

Overview of study objectives and strategy.

Table VII

Overview of study objectives and strategy.

ObjectiveSample set#SamplesSample typeDNA input (ng)Sequencing platform
Individual cell line SNP consistent with database gold standardCell lines with known SNPs and INDELs8Cell line500Hiseq-X
Cell line pools to validate SNP/INDEl performanceCell lines at specific ratio in 3 pools3Cell line500
Confirm specificity (MAF <5%)Clinical FFPE samples35FFPE300–500
Confirm specificity (all MAF)Clinical FFPE samples33FFPE300–500

[i] The first phase of this study was focused on analytical performance validation. It was performed by using 8 cell lines with known allele frequencies for analytical detection analysis. The second phase focused on clinical FFPE samples. Sixty-eight clinical samples were used to compare variants detection in NGS with other approaches. MAF, mutant allele frequency.

SNV detection performance

SNV detection was performed using a Binomial methodology allowing the detection of low MAF somatic mutations across the 2.3 Mb assayed with high sensitivity. For the mixed cell line pools, overall SNV detection performance was high, the results of different depth are shown in Table VIII, for an average depth of 2237, 100% (95% CI, 95.1–100%) of SNV at MAF >10% were successfully detected, as well as 99% (95% CI, 98.6–100%) of SNV at MAF 5–10%. The detection of SNV with MAF between 0.5–5% performance was 92.2% (95% CI, 90.7–93.5%) (Fig. 4A and C and Table VIIIA). In addition, high sensitivity was accompanied with good PPV (the fraction of SNV calls in the pools can also be detected in any of the individual cell lines; Table VIIIB) 99.2% (95% CI, 99–99.4%). The false positives may be due to variants with such a low MAF (<5%) no difference with sequencing noise could hardly be identified. A dPCR confirmation for cancer-related SNV with MAF <5% reported by NGS is necessary before reporting.

Table VIII

Summary of SNV detection performance (sensitivity, ppv).

Table VIII

Summary of SNV detection performance (sensitivity, ppv).

A, Summary of SNV detection performance (sensitivity)

Average coverageMAF <0.5%
n=568
MAF 0.5–5%
n=1419
MAF 5–10%
n=393
MAF >10%
n=73




FNSEN (%)CI (%)FNSEN (%)CI (%)FNSEN (%)CI (%)FNSEN (%)CI (%)
319447117.114.1–20.48693.992.6–95.1010099.1–100010095.1–100
223744621.518.2–25.111192.290.7–93.5199.898.6–100010095.1–100
102944621.518.2–25.116488.486.7–90.1299.598.2–100010095.1–100

B, Summary of SNV detection performance (specificity)

FPPPV


Average coverageTPMAF ≥5%MAF <5%Mean (%)CI (%)

3194572008498.598.2–98.8
2237561904399.299.0–99.4
102946610499.999.8–100

[i] The SNV detection performance of our pipeline in analytical validation. False negatives were germ line SNPs in constituent cell lines that were not detected in mixed cell line data. False positives were SNV calls in pooled samples absent from pure cell lines. MAF, mutation allele frequency; FN, false negative; SEN, sensitivity; CI, confidence interval (calculated as the exact 95% binomial confidence interval).

INDEL detection performance

For INDEL detection, we simply discarded the variants supporting less than 10 reads. The results of different depth are shown in Table IX, for an average depth of 2237, 100% (95% CI, 29.2–100%) of INDEL at MAF >10% were successfully detected, as well as 94.7% of INDEL (95% CI, 74–99.9%) with MAF between 5–10%. Low MAF sites detected performance was 91.5% (95% CI, 85–100%), the performance of variants with MAF <0.5% was also calculated (Fig. 4B and D and Table IXA). Few false-positive calls were observed, with a PPV of 98.2% (95% CI, 97.2–98.9%) (Table IXB). Like SNV detection, due to the false positive under 10%, a dPCR confirmation of these cancer-related INDEL with MAF <10% before reporting is needed.

Table IX

Summary of INDEL performance (sensitivity, ppv).

Table IX

Summary of INDEL performance (sensitivity, ppv).

A, Summary of small insert and deletion detection performance (sensitivity)

Average coverageMAF <0.5%
n=32
MAF 0.5–5%
n=118
MAF 5–10%
n=19
MAF >10%
n=3




FNSEN(%)CI (%)FNSEN (%)CI (%)FNSEN (%)CI (%)FNSEN (%)CI (%)
31942425.011.5–43.4992.486–96.5010082.4–100010029.2–100
22372618.87.2–36.41091.585–95.9194.774–99.9010029.2–100
10292521.99.3–401587.379.9–92.7289.566.9–98.7010029.2–100

B, Summary of small insert and deletion detection performance (specificity)

FPPPV


Average coverageTPMAF >10%MAF <10%Mean (%)CI (%)

3194111902497.996.8–98.6
2237105001998.297.2–98.9
102979401398.497.2–99.1

[i] The INDEL detection performance of our pipeline. INDEL calls in pooled samples had the same base composition and position (±25 bp) which were considered to be true positives. False positives were INDEL calls in pooled samples that were absent from pure cell lines. MAF, mutation allele frequency; FN, false negative; SEN, sensitivity; CI, confidence interval (calculated as the exact 95% binomial confidence interval).

Comparison with other bioinformatics approaches

We evaluated the performance of our bioinformatics pipeline with the cell line models above, focusing on two key steps of our approach. First, we applied statistical models that allow for the identification of a mutation at low MAF from random errors in Illumina sequencing. Second, we used priori knowledge to identify systematic errors always accompanied with specific characteristics, such as strand bias and low base/mapping quality. To measure the effect of our approach, we compared the pooled cell-line result with GATK - widely used software. The GATK detection sensitivity of SNV with MAF >10% was 64.38% (95% CI, 52.3–75.3%), and SNV with 5%<MAF<10% was under 10% but the PPV was 100% (95% CI, 99.7–100%). The sensitivity of INDEL with MAF >10% was 67% (95% CI, 9.4–99.2%), and a high PPV 100% (95% CI, 99–100%) (Tables X and XI), possibly because this widely used tool is designed for whole-genome or whole-exon sequencing data with relatively low depth and variants with high allele frequency, which underline that appropriate filters for ultra-deep sequencing data analysis were critical. Actually, compared with slight performance upgrades under increased coverage depth, the effect of appropriate filters was remarkable in this test.

Table X

Summary of SNV detection performance by GATK (sensitivity, ppv).

Table X

Summary of SNV detection performance by GATK (sensitivity, ppv).

A, Summary of SNV detection performance by GATK (sensitivity)

Average coverageMAF <0.5%
n=568
MAF 0.5–5%
n=1419
MAF 5–10%
n=393
MAF >10%
n=73




FNSEN (%)CI (%)FNSEN (%)CI (%)FNSEN (%)CI (%)FNSEN (%)CI (%)
31945670.180–114170.140–0.53744.832.9–7.41875.3463.9–84.7
22375680.000–0.614170.140–0.53705.853.7–8.72664.3852.3–75.3
10295670.180–114160.210–0.63754.582.7–7.12565.7553.7–76.5

B, Summary of SNV detection performance by GATK (specificity)

FPPPV


Average coverageTPMAF ≥5%MAF <5%Mean (%)CI (%)

3194221210100.099.7–100
2237221310100.099.7–100
1029218800100.099.8–100

[i] The SNV detection performance of GATK pipeline in mixed cell lines. False negatives were germ line SNPs in constituent cell lines that were not detected in mixed cell line data. False positives were SNV calls in pooled samples that were absent from pure cell lines. CI, confidence intervals (calculated as the exact 95% binomial confidence interval); MAF, mutation allele frequency. FN, false negative; SEN, sensitivity.

Table XI

Summary of INDEL detection performance by GATK (sensitivity, ppv).

Table XI

Summary of INDEL detection performance by GATK (sensitivity, ppv).

A, Summary of INDEL detection performance by GATK (sensitivity)

Average coverageMAF <0.5%
n=32
MAF 0.5–5%
n=118
MAF 5–10%
n=19
MAF >10%
n=3




FNSEN (%)CI(%)FNSEN (%)CI (%)FNSEN (%)CI (%)FNSEN (%)CI (%)
3194313.130.1–16.21161.690.2–61615.793.4–39.60100.0029.2–100
2237313.130.1–16.21161.690.2–6185.260.1–261679.4–99.2
1029313.130.1–16.21161.690.2–61710.531.3–33.10100.0029.2–100

B, Summary of INDEL detection performance by GATK (specificity)

FPPPV


Average coverageTPMAF >10%MAF <10%Mean (%)CI (%)

319438500100.099–100
223738600100.099–100
102938000100.099–100

[i] The INDEL detection performance of GATK pipeline. INDEL calls in pooled samples had the same base composition and position (±25 bp) which were considered to be true positives. False positives were INDEL calls in pooled samples that were absent from pure cell lines. MAF, mutation allele frequency. FN, false negative; SEN, sensitivity; CI, confidence interval (calculated as the exact 95% binomial confidence interval).

Concordance between NGS and other approaches

The above studies demonstrate that the NGS-based test has the performance characteristics necessary to accurately detect SNV and INDEL. We further validated test accuracy by comparisons to dPCR for 35 FFPE cancer specimens. To assess the accuracy of low MAF SNV and INDEL detection in routine clinical cancer samples, we selected 35 FFPE resection specimens (31 non-small cell lung cancer, 1 parathyroid carcinoma, 3 breast cancers) previously tested for hotspot mutations in PIK3Ca, EGFR, KRAS and BRAF by NGS, every hotspot mutations detected by NGS, but with MAF <5% would be tested by dPCR. In addition, 32 of 35 (PPV=91.43%, 95% CI, 76.94–98.20%) variants have been supported to be true-positive by dPCR (Tables XII and XIII). Three variants were present at <3% MAF in NGS that were not detected by dPCR. The detected MAF of the two technologies is shown in Fig. 5. Finally, we random selected 33 FFPE samples (NSCLC) with hotspot mutations and performed the ARMS-PCR to verify the overall PPV of our assay. As a result, all 33 mutations could be detected by ARMS-PCR and the PPV was 100% (95% CI, 89.42–100%; Table XIV).

Table XII

3D digital PCR correlation results.

Table XII

3D digital PCR correlation results.

Genes and exonsNGS (no.)Supported by 3d digital pcr (no.)
PPIK3CA exon 911
PPIK3CA exon 1033
PPIK3CA exon 2111
EGFR exon 1810
EGFR exon 1966
EGFR exon 201110
EGFR exon 2154
KRAS exon 255
BRAF exon 1511
KRAS exon 311

[i] The concordance between NGS and 3D digital PCR for variants with mutant allele frequency under 5%.

Table XIII

Summary of concordance between NGS and 3D Digital PCR.

Table XIII

Summary of concordance between NGS and 3D Digital PCR.

Sample idMutationNGS (%)dPCR (%)Cancer typeStage
d001 EGFR:exon19:c.2235_2249del:p.746_750del5.0012.64NSCLC-
d002 PIK3CA:exon21:c.A3140G:p.H1047R3.002.09Breast cancer-
d003 KRAS:exon2:c.G35A:p.G12D2.615.07NSCLC4
d004 EGFR:exon19:c.2235_2249del:p.746_750del2.005.26NSCLC4
d005 BRAF:p.V600Ec.1799T>A2.002.25NSCLC4
d006 EGFR:exon21:c.T2573G:p.L858R1.000.00NSCLC4
d007 EGFR:exon20:c.C2369T:p.T790M1.000.62NSCLC4
d008 KRAS:exon2:c.G35A:p.G12D1.000.93Parathyroid carcinoma4
d009 PIK3CA:exon9:c.1633G>A:p.E545K0.920.68NSCLC-
d010 EGFR:exon20:c.C2369T:p.T790M0.901.05NSCLC4
d011 EGFR:exon19:c.2236_2250del:p.746_750del0.790.84NSCLC3
d012 KRAS:exon2:c.G35A:p.G12D0.771.20NSCLC4
d013 EGFR:exon20:c.C2369T:p.T790M0.730.90NSCLC2
d014 EGFR:exon19:c.2235_2249delGGAATTAAGAGAAGC: p.E746_A750del0.711.29NSCLC4
d015 EGFR:exon21:c.T2573G:p.L858R0.710.38NSCLC4
d016 KRAS:p.G12C:c.34G>T0.680.08NSCLC4
d017 PIK3CA:exon10:c.G1633A:p.E545K0.640.57NSCLC4
d018 EGFR:exon21:c.2573T>G:p.L858R0.500.29NSCLC
d019 KRAS:exon2:c.G37T:p.G13C0.470.44NSCLC-
d020 PIK3CA:c.1633G>A:p.E545K0.420.37NSCLC4
d021 PIK3CA:exon10:c.G1624A:p.E542K0.410.73Breast cancer3
d022 EGFR:exon19:c.2235_2249del:p.745_750del0.400.32NSCLC4
d023 EGFR:exon21:c.2573T>G:p.L858R0.380.33NSCLC4
d024 EGFR:p.L858R:c.2573T>G0.320.25NSCLC-
d025 EGFR:exon20:c.C2369T:p.T790M0.320.31NSCLC4
d026EGFR exon18:c.2155G>T:p.G719C0.300.00Breast cancer3
d027 EGFR:exon20:c.C2369T:p.T790M0.270.22NSCLC4
d028EGFR:exon20 c.C2369T:p.T790M0.250.22NSCLC4
d029 EGFR:exon19:c.2236_2250del:p.746_750del0.240.34NSCLC3
d030 KRAS:c.35G>A:p.G12D0.180.17NSCLC4
d031 EGFR:exon20:c.C2369T:p.T790M0.160.00NSCLC4
d032 EGFR:exon20:c.C2369T:p.T790M0.100.08NSCLC-
d033 EGFR:exon20:c.C2369T:p.T790M0.090.04NSCLC4
d034 EGFR:exon20:c.C2369T:p.T790M0.090.10NSCLC4
d035 EGFR:exon20:c.C2369T:p.T790M0.070.03NSCLC4

[i] The mutant allele frequency of each variant detected in NGS and 3D Digital PCR. dPCR, 3D Digital PCR; NSCLC, non-small cell lung cancer are shown.

Table XIV

Summary of concordance between NGS and ARMS-PCR.

Table XIV

Summary of concordance between NGS and ARMS-PCR.

Sample idMutationNGS (%)ΔCtResultsCancer typeStage
a001EGFR:exon20: c.C2369T: p.T790M2.006.64PositiveNSCLC4
a002EGFR: exon21: c.T2573G: p.L858R14.005.2PositiveNSCLC4
a003EGFR: exon21: c.T2573G: p.L858R14.004.47PositiveNSCLC4
a004EGFR: exon21: c.T2573G: p.L858R3.007.38PositiveNSCLC3
a005EGFR: exon21: c.T2573G: p.L858R4.006.01PositiveNSCLC4
a006EGFR: exon21: c.T2573G: p.L858R13.004.51PositiveNSCLC3
a007EGFR: exon21: c.T2573G: p.L858R9.005.45PositiveNSCLC-
a008EGFR: exon21: c.T2573G: p.L858R2.0010.81PositiveNSCLC3
a009EGFR: exon21: c.T2573G: p.L858R16.003.96PositiveNSCLC4
a010EGFR: exon21: c.T2573G: p.L858R63.002.13PositiveNSCLC4
a011EGFR: exon21: c.T2573G: p.L858R31.002.32PositiveNSCLC4
a012EGFR: exon21: c.T2573G: p.L858R13.008.3PositiveNSCLC4
a013 EGFR:exon21:c.T2582A:p.L861Q8.0013.61positiveNSCLC4
a014 EGFR:exon20:c.C2369T:p.T790M13.005.47PositiveNSCLC4
a015 EGFR:exon19:c.2235_2249del:p.745_750del15.002.21PositiveNSCLC-
a016 EGFR:exon19:c.2235_2249del:p.745_750del9.003.36PositiveNSCLC4
a017 EGFR:exon19:c.2235_2249del:p.745_750del7.003.28PositiveNSCLC4
a018 EGFR:exon19:c.2239_2256del:p.747_752del12.007.9PositiveNSCLC4
a019 EGFR:exon19:c.2236_2250del:p.746_750del8.005.39PositiveNSCLC4
a020 EGFR:exon19:c.2236_2250del:p.746_750del13.004.57PositiveNSCLC3
a021 EGFR:exon19:c.2236_2250del:p.746_750del10.004.18PositiveNSCLC3
a022 EGFR:exon19:c.2254_2277del:p.752_759del8.003.1PositiveNSCLC4
a023 EGFR:exon19:c.2237_2254del:p.746_752del9.003.99PositiveNSCLC4
a024 EGFR:exon19:c.2237_2254del:p.746_752del12.003.22PositiveNSCLC4
a025 EGFR:exon19:c.2238_2252del:p.746_751del15.002.91PositiveNSCLC-
a026 EGFR:exon19:c.2235_2249del:p.745_750del7.003.36PositiveNSCLC4
a027 EGFR:exon19:c.2235_2249del:p.745_750del11.002.95PositiveNSCLC4
a028 EGFR:exon19:c.2240_2254del:p.747_752del20.003.88PositiveNSCLC4
a029 EGFR:exon19:c.2236_2250del:p.746_750del19.005.21PositiveNSCLC4
a030 EGFR:exon19:c.2235_2249del:p.745_750del16.002.33PositiveNSCLC4
a031 EGFR:exon19:c.2235_2249del:p.745_750del9.053.06PositiveNSCLC-
a032 EGFR:exon19:c.2237_2253del:p.746_751del11.003.22PositiveNSCLC4
a033 EGFR:exon19:c.2235_2249del:p.745_750del13.002.8PositiveNSCLC4

[i] The specificity of our assay in clinical samples. Thirty-three randomly selected FFPE tissues with positive detection in NGS were tested by ARMS-PCR. The ΔCt was the Ct value of sample minus control and the cut-off for T790M, L858R, L861Q, 19-Del were 8, 11, 12, 11, respectively. ΔCt, Ct (sample) − Ct (control). NSCLC, non-small cell lung cancer.

Discussion

Cancer diagnostic is undergoing a rapid development (31), routine tests like FISH and IHC can only detect limited known variants, besides it fully relies on the doctor's experience. PCR-based approach, like Sanger sequencing or dPCR used by us in this study, still cannot test multiple sites in one run. Furthermore, Sanger sequencing cannot detect variants with MAF under 10% (32) and dPCR waste too many samples, which remain problems for clinical application. The NGS-based test with increased access and decreased cost has more advantages in comprehensive detection of the cancer-related mutations (3335). For detecting mutations with low frequency, NGS-based test with high sensitivity is needed. However, high sensitivity always comes with false-positives, which may lead to suboptimal treatment. Finally, some other factors, like DNA damage and contamination in clinical samples (36,37), make it critical to generate a complex validation of NGS assay.

In the present study, we developed and validated the NGS-based assay, using germ line mutations in 1000 genome cell lines and certain somatic INDEL in cosmic database to simulate the tumor heterogeneity or impurity in clinical samples. We mixed these samples to measure the analytic sensitivity and PPV of NCP assay at low MAF and used 3 pools to obtain the correlation between median coverage and variants detection performance. The performance of our test was high for variants with MAF >5%. In cell line model with 2236X median coverage, sensitivity was 99.8% for SNP, 94.7% for INDEL with a PPV of 99 and 98%. The 0.5%<MAF<5% variant sensitivity was 92.2% for SNV and 91.5% for INDEL which was not desirable. Because of the complexity of 483 genes, it was difficult to ensure such low MAF variant detection sensitivity. On the other hand, we confirmed the low MAF detection by dPCR which could identify rare mutations specifically. We also compared our bioinformatics pipeline with common pipeline GATK (29,30), which is widely used in genotype analysis. The overall PPV was high at the expense of sensitivity, which may be due to these approaches being developed to call germ line variants. The results highlighted that appropriate filtering approach is critical for low MAF variant detection. Actually, the filters were more important than the increase of coverage depth as showed in the different coverage tests. For specificity analysis, each called variant was classified as a false positive if a matching alteration was not detected in the pure sample. However, this approach could not recognize the false positive generated by systematic errors. Given the high sensitivity of this technology, high-throughput clinical trials are required to confirm its reliability for the molecular diagnosis of cancer (38). Therefore, 35 patient specimens previously tested by NCP assay and having low MAF <5% variants were used to test in parallel by dPCR. The correlation coefficient of NGS and dPCR was low (0.78) and 32 of 35 (91.43%) NGS detected variants could be confirmed by dPCR. The discordance was possibly due to the heterogeneity in tumor specimens or false positive in NGS, the dPCR verification is needed for such low MAF variants before reporting. Like low MAF variants, we used ARMS-PCR to test the 33 random selected FFPE samples with hotspot mutations detected by NGS and obtained a high concordance (PPV=100%).

Taken together, we used high sequencing coverage and a statistical test with several hard filters generated from clinical samples to separate low MAF SNV/INDEL from false positives. To balance the cost of NGS and accuracy of variant calls for low MAF variants, we used pooled cell line models with certain germ line SNP in different data size to get the relationship accuracy between data size and variants. From this test, we validated the best target median coverage (2000X) that can meet the analysis requirement, whereas the low MAF variants detection needed to be corrected by dPCR. On the other hand, the overall performance of this assay was good in the ARMS-PCR test. However, our results cannot meet the requirement of different variant types in clinical use like other NGS-based approaches (1720,39), which is one of the most important aspects for NGS compared to other traditional approaches. Furthermore, due to the DNA requirement of dPCR verification and quantity of extraction in plasma (40,41), this NGS-dPCR combined approach could only be used in FFPE sample but not plasma. With the advantages of non-invasive and overcome tumor-heterogeneity (4244), the sequencing of plasma sample still needed more study. To reduce the sequencing errors confound with rare mutations, a NGS method termed Duplex sequencing was developed these years and may be useful in future plasma sequencing (4547). In addition, given the capability of NGS test to detect variants with low MAF, the correlation between the NGS clinical report and the effect of targeted therapy still need further assessment (48). Finally, our NCP assay can give more mutation information and thus expand the treatment choices for patients, but more efforts still need to be done for future cancer diagnostics.

References

1 

Renfro LA, An MW and Mandrekar SJ: Precision oncology: A new era of cancer clinical trials. Cancer Lett. S0304-3835(16)30163-X. 2016. View Article : Google Scholar : PubMed/NCBI

2 

Arteaga CL and Baselga J: Impact of genomics on personalized cancer medicine. Clin Cancer Res. 18:612–618. 2012. View Article : Google Scholar : PubMed/NCBI

3 

MacConaill LE, Van Hummelen P, Meyerson M and Hahn WC: Clinical implementation of comprehensive strategies to characterize cancer genomes: Opportunities and challenges. Cancer Discov. 1:297–311. 2011. View Article : Google Scholar : PubMed/NCBI

4 

Romano E, Schwartz GK, Chapman PB, Wolchock JD and Carvajal RD: Treatment implications of the emerging molecular classification system for melanoma. Lancet Oncol. 12:913–922. 2011. View Article : Google Scholar : PubMed/NCBI

5 

Kwak EL, Bang YJ, Camidge DR, Shaw AT, Solomon B, Maki RG, Ou SH, Dezube BJ, Jänne PA, Costa DB, et al: Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. N Engl J Med. 363:1693–1703. 2010. View Article : Google Scholar : PubMed/NCBI

6 

Shaw AT, Kim DW, Nakagawa K, Seto T, Crinó L, Ahn MJ, De Pas T, Besse B, Solomon BJ, Blackhall F, et al: Crizotinib versus chemotherapy in advanced ALK-positive lung cancer. N Engl J Med. 368:2385–2394. 2013. View Article : Google Scholar : PubMed/NCBI

7 

Bollag G, Tsai J, Zhang J, Zhang C, Ibrahim P, Nolop K and Hirth P: Vemurafenib: The first drug approved for BRAF-mutant cancer. Nat Rev Drug Discov. 11:873–886. 2012. View Article : Google Scholar : PubMed/NCBI

8 

Garraway LA and Lander ES: Lessons from the cancer genome. Cell. 153:17–37. 2013. View Article : Google Scholar : PubMed/NCBI

9 

Pao W: New approaches to targeted therapy in lung cancer. Proc Am Thorac Soc. 9:72–73. 2012. View Article : Google Scholar : PubMed/NCBI

10 

Thomas RK, Baker AC, Debiasi RM, Winckler W, Laframboise T, Lin WM, Wang M, Feng W, Zander T, MacConaill L, et al: High-throughput oncogene mutation profiling in human cancer. Nat Genet. 39:347–351. 2007. View Article : Google Scholar : PubMed/NCBI

11 

MacConaill LE, Campbell CD, Kehoe SM, Bass AJ, Hatton C, Niu L, Davis M, Yao K, Hanna M, Mondal C, et al: Profiling critical cancer gene mutations in clinical tumor samples. PLoS One. 4:e78872009. View Article : Google Scholar : PubMed/NCBI

12 

Tao YF, Wu D, Pang L, Zhao WL, Lu J, Wang N, Wang J, Feng X, Li YH, Ni J, et al: Analyzing the gene expression profile of pediatric acute myeloid leukemia with real-time PCR arrays. Cancer Cell Int. 12:1946–1958. 2012.

13 

McCourt CM, Boyle D, James J and Salto-Tellez M: Immunohistochemistry in the era of personalised medicine. J Clin Pathol. 66:58–61. 2013. View Article : Google Scholar

14 

Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, et al: COSMIC: Mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39(Database issue): D945–D950. 2010. View Article : Google Scholar : PubMed/NCBI

15 

Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR, Meyerson M, Gabriel SB, Lander ES and Getz G: Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 505:495–501. 2014. View Article : Google Scholar : PubMed/NCBI

16 

Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA, et al: Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 499:214–218. 2013. View Article : Google Scholar : PubMed/NCBI

17 

Frampton GM, Fichtenholtz A, Otto GA, Wang K, Downing SR, He J, Schnall-Levin M, White J, Sanford EM, An P, et al: Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat Biotechnol. 31:1023–1031. 2013. View Article : Google Scholar : PubMed/NCBI

18 

Hovelson DH, McDaniel AS, Cani AK, Johnson B, Rhodes K, Williams PD, Bandla S, Bien G, Choppa P, Hyland F, et al: Development and validation of a scalable next-generation sequencing system for assessing relevant somatic variants in solid tumors. Neoplasia. 17:385–399. 2015. View Article : Google Scholar : PubMed/NCBI

19 

Choudhary A, Mambo E, Sanford T, Boedigheimer M, Twomey B, Califano J, Hadd A, Oliner KS, Beaudenon S, Latham GJ, et al: Evaluation of an integrated clinical workflow for targeted next-generation sequencing of low-quality tumor DNA using a 51-gene enrichment panel. BMC Med Genomics. 7:622014. View Article : Google Scholar : PubMed/NCBI

20 

Cheng DT, Mitchell TN, Zehir A, Shah RH, Benayed R, Syed A, Chandramohan R, Liu ZY, Won HH, Scott SN, et al: Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J Mol Diagn. 17:251–264. 2015. View Article : Google Scholar : PubMed/NCBI

21 

Cibulskis K, McKenna A, Fennell T, Banks E, DePristo M and Getz G: ContEst: Estimating cross-contamination of human samples in next-generation sequencing data. Bioinformatics. 27:2601–2602. 2011.PubMed/NCBI

22 

Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpey P, et al: Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 366:883–892. 2012. View Article : Google Scholar : PubMed/NCBI

23 

Kinz E, Leiherer A, Lang AH, Drexel H and Muendlein A: Accurate quantitation of JAK2 V617F allele burden by array-based digital PCR. Int J Lab Hematol. 37:217–224. 2015. View Article : Google Scholar

24 

Shao D1, Lin Y, Liu J, Wan L, Liu Z, Cheng S, Fei L, Deng R, Wang J, Chen X, et al: A targeted next-generation sequencing method for identifying clinically relevant mutation profiles in lung adenocarcinoma. Sci Rep. 6:223382016. View Article : Google Scholar : PubMed/NCBI

25 

Forbes SA1, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, Menzies A, et al: COSMIC (the Catalogue of Somatic Mutations in Cancer): A resource to investigate acquired mutations in human cancer. Nucleic Acids Res. 8(Database issue): D652–D657. 2009.

26 

Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA; 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 467:1061–1073. 2010. View Article : Google Scholar : PubMed/NCBI

27 

Warrick JI, Hovelson DH, Amin A, Liu CJ, Cani AK, McDaniel AS, Yadati V, Quist MJ, Weizer AZ, Brenner JC, et al: Tumor evolution and progression in multifocal and paired non-invasive/invasive urothelial carcinoma. Virchows Arch. 466:297–311. 2015. View Article : Google Scholar

28 

Li H and Durbin R: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 26:589–595. 2010. View Article : Google Scholar : PubMed/NCBI

29 

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297–1303. 2010. View Article : Google Scholar : PubMed/NCBI

30 

DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, et al: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 43:491–498. 2011. View Article : Google Scholar : PubMed/NCBI

31 

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G and Durbin R; 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics. 25:2078–2079. 2009. View Article : Google Scholar : PubMed/NCBI

32 

Arsenic R, Treue D, Lehmann A, Hummel M, Dietel M, Denkert C and Budczies J: Comparison of targeted next-generation sequencing and Sanger sequencing for the detection of PIK3CA mutations in breast cancer. BMC Clin Pathol. 15:202015. View Article : Google Scholar : PubMed/NCBI

33 

Borad MJ, Champion MD, Egan JB, Liang WS, Fonseca R, Bryce AH, McCullough AE, Barrett MT, Hunt K, Patel MD, et al: Integrated genomic characterization reveals novel, therapeutically relevant drug targets in FGFR and EGFR pathways in sporadic intrahepatic cholangiocarcinoma. PLoS Genet. 10:e10041352014. View Article : Google Scholar : PubMed/NCBI

34 

Hadd AG, Houghton J, Choudhary A, Sah S, Chen L, Marko AC, Sanford T, Buddavarapu K, Krosting J, Garmire L, et al: Targeted, high-depth, next-generation sequencing of cancer genes in formalin-fixed, paraffin-embedded and fine-needle aspiration tumor specimens. J Mol Diagn. 15:234–247. 2013. View Article : Google Scholar : PubMed/NCBI

35 

Roychowdhury S, Iyer MK, Robinson DR, Lonigro RJ, Wu YM, Cao X, Kalyana-Sundaram S, Sam L, Balbin OA, Quist MJ, et al: Personalized oncology through integrative high-throughput sequencing: A pilot study. Sci Transl Med. 3:111ra1212011. View Article : Google Scholar : PubMed/NCBI

36 

Kerick M, Isau M, Timmermann B, Sültmann H, Herwig R, Krobitsch S, Schaefer G, Verdorfer I, Bartsch G, Klocker H, et al: Targeted high throughput sequencing in clinical cancer settings: Formaldehyde fixed-paraffin embedded (FFPE) tumor tissues, input amount and tumor heterogeneity. BMC Med Genomics. 4:682011. View Article : Google Scholar : PubMed/NCBI

37 

Schweiger MR, Kerick M, Timmermann B, Albrecht MW, Borodina T, Parkhomchuk D, Zatloukal K and Lehrach H: Genome-wide massively parallel sequencing of formaldehyde fixed-paraffin embedded (FFPE) tumor tissues for copy-number- and mutation-analysis. PLoS One. 4:e55482009. View Article : Google Scholar : PubMed/NCBI

38 

Chevrier S, Arnould L, Ghiringhelli F, Coudert B, Fumoleau P and Boidot R: Next-generation sequencing analysis of lung and colon carcinomas reveals a variety of genetic alterations. Int J Oncol. 45:1167–1174. 2014.PubMed/NCBI

39 

Cottrell CE, Al-Kateb H, Bredemeyer AJ, Duncavage EJ, Spencer DH, Abel HJ, Lockwood CM, Hagemann IS, O'Guin SM, Burcea LC, et al: Validation of a next-generation sequencing assay for clinical molecular oncology. J Mol Diagn. 16:89–105. 2014. View Article : Google Scholar

40 

Haber DA and Velculescu VE: Blood-based analyses of cancer: Circulating tumor cells and circulating tumor DNA. Cancer Discov. 4:650–661. 2014. View Article : Google Scholar : PubMed/NCBI

41 

Arnedos M, Vicier C, Loi S, Lefebvre C, Michiels S, Bonnefoi H and Andre F: Precision medicine for metastatic breast cancer--limitations and solutions. Nat Rev Clin Oncol. 12:693–704. 2015. View Article : Google Scholar : PubMed/NCBI

42 

Newman AM, Bratman SV, To J, Wynne JF, Eclov NC, Modlin LA, Liu CL, Neal JW, Wakelee HA, Merritt RE, et al: An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med. 20:548–554. 2014. View Article : Google Scholar : PubMed/NCBI

43 

Ignatiadis M and Dawson SJ: Circulating tumor cells and circulating tumor DNA for precision medicine: Dream or reality? Ann Oncol. 25:2304–2313. 2014. View Article : Google Scholar : PubMed/NCBI

44 

Lipson EJ, Velculescu VE, Pritchard TS, Sausen M, Pardoll DM, Topalian SL and Diaz LA Jr: Circulating tumor DNA analysis as a real-time method for monitoring tumor burden in melanoma patients undergoing treatment with immune checkpoint blockade. J Immunother Cancer. 2:422014. View Article : Google Scholar : PubMed/NCBI

45 

Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB and Loeb LA: Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci USA. 109:14508–14513. 2012. View Article : Google Scholar : PubMed/NCBI

46 

Kennedy SR, Schmitt MW, Fox EJ, Kohrn BF, Salk JJ, Ahn EH, Prindle MJ, Kuong KJ, Shen JC, Risques RA, et al: Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc. 9:2586–2606. 2014. View Article : Google Scholar : PubMed/NCBI

47 

Newman AM, Lovejoy AF, Klass DM, Kurtz DM, Chabon JJ, Scherer F, Stehr H, Liu CL, Bratman SV, Say C, et al: Integrated digital error suppression for improved detection of circulating tumor DNA. Nat Biotechnol. 34:547–555. 2016. View Article : Google Scholar : PubMed/NCBI

48 

Luo H, Li H, Hu Z, Wu H, Liu C, Li Y, Zhang X, Lin P, Hou Q, Ding G, et al: Noninvasive diagnosis and monitoring of mutations by deep sequencing of circulating tumor DNA in esophageal squamous cell carcinoma. Biochem Biophys Res Commun. 471:596–602. 2016. View Article : Google Scholar : PubMed/NCBI

Related Articles

Journal Cover

November-2016
Volume 49 Issue 5

Print ISSN: 1019-6439
Online ISSN:1791-2423

Sign up for eToc alerts

Recommend to Library

Copy and paste a formatted citation
x
Spandidos Publications style
Liang J, She Y, Zhu J, Wei L, Zhang L, Gao L, Wang Y, Xing J, Guo Y, Meng X, Meng X, et al: Development and validation of an ultra-high sensitive next-generation sequencing assay for molecular diagnosis of clinical oncology. Int J Oncol 49: 2088-2104, 2016
APA
Liang, J., She, Y., Zhu, J., Wei, L., Zhang, L., Gao, L. ... Li, P. (2016). Development and validation of an ultra-high sensitive next-generation sequencing assay for molecular diagnosis of clinical oncology. International Journal of Oncology, 49, 2088-2104. https://doi.org/10.3892/ijo.2016.3707
MLA
Liang, J., She, Y., Zhu, J., Wei, L., Zhang, L., Gao, L., Wang, Y., Xing, J., Guo, Y., Meng, X., Li, P."Development and validation of an ultra-high sensitive next-generation sequencing assay for molecular diagnosis of clinical oncology". International Journal of Oncology 49.5 (2016): 2088-2104.
Chicago
Liang, J., She, Y., Zhu, J., Wei, L., Zhang, L., Gao, L., Wang, Y., Xing, J., Guo, Y., Meng, X., Li, P."Development and validation of an ultra-high sensitive next-generation sequencing assay for molecular diagnosis of clinical oncology". International Journal of Oncology 49, no. 5 (2016): 2088-2104. https://doi.org/10.3892/ijo.2016.3707