<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xml:lang="en" article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
<?release-delay 0|0?>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Molecular Medicine Reports</journal-id>
<journal-title-group>
<journal-title>Molecular Medicine Reports</journal-title>
</journal-title-group>
<issn pub-type="ppub">1791-2997</issn>
<issn pub-type="epub">1791-3004</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3892/mmr.2017.6336</article-id>
<article-id pub-id-type="publisher-id">mmr-15-05-2489</article-id>
<article-categories>
<subj-group>
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A three-caller pipeline for variant analysis of cancer whole-exome sequencing data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Liu</surname><given-names>Ze-Kun</given-names></name>
<xref rid="af1-mmr-15-05-2489" ref-type="aff"/>
<xref rid="fn1-mmr-15-05-2489" ref-type="author-notes">&#x002A;</xref></contrib>
<contrib contrib-type="author"><name><surname>Shang</surname><given-names>Yu-Kui</given-names></name>
<xref rid="af1-mmr-15-05-2489" ref-type="aff"/>
<xref rid="fn1-mmr-15-05-2489" ref-type="author-notes">&#x002A;</xref></contrib>
<contrib contrib-type="author"><name><surname>Chen</surname><given-names>Zhi-Nan</given-names></name>
<xref rid="af1-mmr-15-05-2489" ref-type="aff"/>
<xref rid="c1-mmr-15-05-2489" ref-type="corresp"/></contrib>
<contrib contrib-type="author"><name><surname>Bian</surname><given-names>Huijie</given-names></name>
<xref rid="af1-mmr-15-05-2489" ref-type="aff"/>
<xref rid="c1-mmr-15-05-2489" ref-type="corresp"/></contrib>
</contrib-group>
<aff id="af1-mmr-15-05-2489">Department of Cell Biology and National Translational Science Center for Molecular Medicine, Fourth Military Medical University, Xi&#x0027;an, Shaanxi 710032, P.R. China</aff>
<author-notes>
<corresp id="c1-mmr-15-05-2489"><italic>Correspondence to</italic>: Professor Huijie Bian or Professor Zhi-Nan Chen, Department of Cell Biology and National Translational Science Center for Molecular Medicine, Fourth Military Medical University, 169 Changle West Road, Xi&#x0027;an, Shaanxi 710032, P.R. China, E-mail: <email>hjbian@fmmu.edu.cn</email>, E-mail: <email>znchen@fmmu.edu.cn</email></corresp>
<fn id="fn1-mmr-15-05-2489"><label>&#x002A;</label><p>Contributed equally</p></fn>
</author-notes>
<pub-date pub-type="ppub"><month>05</month><year>2017</year></pub-date>
<pub-date pub-type="epub"><day>16</day><month>03</month><year>2017</year></pub-date>
<volume>15</volume>
<issue>5</issue>
<fpage>2489</fpage>
<lpage>2494</lpage>
<history>
<date date-type="received"><day>17</day><month>02</month><year>2016</year></date>
<date date-type="accepted"><day>02</day><month>02</month><year>2017</year></date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; Liu et al.</copyright-statement>
<copyright-year>2017</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivs License</ext-link>, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.</license-p></license>
</permissions>
<abstract>
<p>Rapid advancements in next generation sequencing (NGS) technologies, coupled with the dramatic decrease in cost, have made NGS one of the leading approaches applied in cancer research. In addition, it is increasingly used in clinical practice for cancer diagnosis and treatment. Somatic (cancer-only) single nucleotide variants and small insertions and deletions (indels) are the simplest classes of mutation, however, their identification in whole exome sequencing data is complicated by germline polymorphisms, tumor heterogeneity and errors in sequencing and analysis. An increasing number of software and methodological guidelines are being published for the analysis of sequencing data. Usually, the algorithms of MuTect, VarScan and Genome Analysis Toolkit are applied to identify the variants. However, one of these algorithms alone results in incomplete genomic information. To address this issue, the present study developed a systematic pipeline for analyzing the whole exome sequencing data of hepatocellular carcinoma (HCC) using a combination of the three algorithms, named the three-caller pipeline. Application of the three-caller pipeline to the whole exome data of HCC, improved the detection of true positive mutations and a total of 75 tumor-specific somatic variants were identified. Functional enrichment analysis revealed the mutations in the genes encoding cell adhesion and regulation of Ras GTPase activity. This pipeline provides an effective approach to identify variants from NGS data for subsequent functional analyses.</p>
</abstract>
<kwd-group>
<kwd>hepatocellular carcinoma</kwd>
<kwd>somatic mutation</kwd>
<kwd>whole-exome sequencing</kwd>
<kwd>pipeline</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>It is well established that tumorigenesis is attributed to chromosomal instability or accumulated genetic changes, including structure variations, genetic copy number variants, single nucleotide variants (SNVs) and small insertions and deletions (indels) (<xref rid="b1-mmr-15-05-2489" ref-type="bibr">1</xref>&#x2013;<xref rid="b3-mmr-15-05-2489" ref-type="bibr">3</xref>). Somatic mutations are defined by mutations that are absent in corresponding adjacent tissues; however, they are present in all tumors (<xref rid="b4-mmr-15-05-2489" ref-type="bibr">4</xref>). Somatic mutation calling is a critical step for cancer genome characterization and clinical genotyping. Next-generation sequencing (NGS) has become a popular strategy for genotyping, enabling more precise mutation detection compared with traditional methods due to its high resolution and high throughput. Whole-genome sequencing reveals overall genetic information about the variants, whereas whole-exome sequencing (WES) with effective strategy only points economically at coding regions and is currently offered by more laboratories (<xref rid="b5-mmr-15-05-2489" ref-type="bibr">5</xref>). WES of tumor samples and matched normal controls has the potential to rapidly identify protein-altering mutations across hundreds of patients, potentially enabling the discovery of recurrent events that drive tumor development and growth. Identification of somatic mutations from WES data is an increasingly common technique in the study of cancer genomics, and a large number of somatic alterations have been identified by WES in extensive tumor types (<xref rid="b6-mmr-15-05-2489" ref-type="bibr">6</xref>&#x2013;<xref rid="b9-mmr-15-05-2489" ref-type="bibr">9</xref>). The most prevalent mutations observed are in the p53 tumor suppressor gene (TP53), Wnt/&#x03B2;-catenin signaling pathway regulatory genes (catenin &#x03B2;1 and AXIN 1), chromatin remodeling complex components [AT-rich interactive domain (ARID) 2 and ARID1A], Janus kinase (JAK)/signal transducer and activator of transcription pathway-regulated JAK1, as well as hepatitis B (HBV) integrations into myeloid/lymphoid or mixed-lineage leukemia 4, telomerase reverse transcriptase and cyclin E1 (<xref rid="b10-mmr-15-05-2489" ref-type="bibr">10</xref>,<xref rid="b11-mmr-15-05-2489" ref-type="bibr">11</xref>).</p>
<p>The calling of accurate somatic mutations using WES data remains one of the major challenges in cancer genomics due to various sources of errors, including artifacts occurring during polymerase chain reaction (PCR) amplification or targeted capture, machine sequencing errors and incorrect local alignments of reads (<xref rid="b12-mmr-15-05-2489" ref-type="bibr">12</xref>). Tumor heterogeneity and normal tissue contamination generate additional difficulties for the identification of tumor-specific somatic mutations (<xref rid="b12-mmr-15-05-2489" ref-type="bibr">12</xref>,<xref rid="b13-mmr-15-05-2489" ref-type="bibr">13</xref>). In recent years, several methods have been developed to improve the accuracy of somatic mutation calling. Despite the variations in the methodology of somatic mutation algorithms, the aim of each program is to identify tumor-specific variants by comparing the tumor variant data with the dbSNP of paired adjacent tissue and germline variant data in the same patient. Currently the most popular computational algorithms are MuTect (<xref rid="b14-mmr-15-05-2489" ref-type="bibr">14</xref>), VarScan2 (<xref rid="b15-mmr-15-05-2489" ref-type="bibr">15</xref>) and Genome Analysis Toolkit (GATK) (<xref rid="b16-mmr-15-05-2489" ref-type="bibr">16</xref>). GATK calculates the variants in tumors and adjacent tissues separately, and then subtracts the variants identified in the adjacent tissues from those in the tumors. MuTect and VarScan2 directly compare the tumor tissues with the adjacent tissues at each mutation point, which in some cases improves the accuracy of variant calling. MuTect detects somatic mutation sensitively with a Bayesian model at low allele-fractions, whereas VarScan2 applies a powerful heuristic/statistic approach to identify high-quality variants (<xref rid="b12-mmr-15-05-2489" ref-type="bibr">12</xref>). However, it is unclear which is the best strategy for identifying and accurately calling genome variations as well as how well these different tools improve the true positive mutations when they are combined.</p>
<p>The present study integrated the resources of different somatic mutation algorithms and optimized their own parameters in order to identify novel and recurrent mutations more effectively and faster. The present study used one case of hepatocellular carcinoma (HCC) to explain the whole-exome analysis pipeline and identify the key somatic mutations of HCC.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title/>
<sec>
<title>Patient</title>
<p>A punctured HCC tumor and paired adjacent tissue was obtained from a patient (57 years, male) at the Youan Hospital, Capital Medical University of China (Beijing, China) and complied with the principles of The Declaration of Helsinki. The patient was infected with HBV and received no radiation and chemotherapy prior to radiofrequency ablation.</p>
</sec>
<sec>
<title>NGS platforms</title>
<p>The DNA was extracted using an E.Z.N.A.<sup>&#x00AE;</sup> Tissue DNA Kit (Omega Bio-Tek, Inc., Norcross, GA, USA) and the extracted DNA was captured using Agilent Human All Exon 50 M kit (Agilent Technologies, Inc., Santa Clara, CA, USA) following the protocols recommended by the manufacturer. Sequencing machines generated a large volume of data at a rapid speed by sequencing paired-end DNA fragments in parallel using Illumina His-seq2,000 (Illumina, Inc., San Diego, CA, USA) (<xref rid="b17-mmr-15-05-2489" ref-type="bibr">17</xref>,<xref rid="b18-mmr-15-05-2489" ref-type="bibr">18</xref>). Following a series of library construction and actual sequencing, a large quantity of raw data was produced.</p>
</sec>
<sec>
<title>Quality evaluation of the raw reads</title>
<p>Raw reads generated by a sequenator are usually affected by adverse factors, including adaptor contamination, poor base sequence quality and guanine-cytosine (GC) bias (<xref rid="b19-mmr-15-05-2489" ref-type="bibr">19</xref>). Once the raw data was obtained, the quality of raw reads was assessed and the adaptor was clipped using fastq-mcf (version 1.04.636; <uri xlink:href="http://www.github.com/ExpressionAnalysis/ea-utils/blob/wiki/FastqMcf.md">www.github.com/ExpressionAnalysis/ea-utils/blob/wiki/FastqMcf.md</uri>). The sequencing data was then processed using the FastQC tool (<uri xlink:href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc/">http://www.bioinformatics.babraham.ac.uk/projects/fastqc/</uri>) to analyze the distribution of base GC content and sequence quality scores.</p>
</sec>
<sec>
<title>Alignment and duplicated PCR removal</title>
<p>Following the quality control analyses, the processed reads were aligned to an established reference genome (version hg19), which was provided by the University of California Santa Cruz (Santa Cruz, CA, USA) (<xref rid="b20-mmr-15-05-2489" ref-type="bibr">20</xref>). Millions of short reads were aligned efficiently to the reference genome using Burrows-Wheeler Aligner (BWA) software with default parameters, which were based on the Burrows-Wheeler transform (<xref rid="b21-mmr-15-05-2489" ref-type="bibr">21</xref>). The aligned reads were then stored in BAM file (.bam) using samtools software (<xref rid="b22-mmr-15-05-2489" ref-type="bibr">22</xref>), which was able to sort and index the BAM file to save space and help subsequent process. For the assembled genome data, the picard tool (<uri xlink:href="http://picard.sourceforge.net/index.shtml">http://picard.sourceforge.net/index.shtml</uri>) was combined with bamtools to filter out the mismatching and inappropriate reads. In addition, picard removed the read duplicates derived from library PCR. The data distribution and reads coverage were then evaluated using the CalculateHsMetrics package. Recalibration and realignment were performed using GATK (version 2.8; Broad Institute, Cambridge, MA, USA; <uri xlink:href="http://www.broadinstitute.org/gatk/">www.broadinstitute.org/gatk/</uri>). Finally, the resulting data were used for further variation identification.</p>
</sec>
<sec>
<title>Variant identification</title>
<p>A key step in the analysis of cancer exome sequencing data is the identification of variants. The depth of sequence coverage determines the choice of somatic mutation algorithms used for identification of variants mutation. The different identification abilities in different allele frequencies of GATK (version 2.8.1), MuTect (version 1.1.4; Broad Institute; <uri xlink:href="http://www.broadinstitute.org/cancer/cga/mutect">http://www.broadinstitute.org/cancer/cga/mutect</uri>), and VarScan (version 2.3.6; <uri xlink:href="http://varscan.sourceforge.net/">http://varscan.sourceforge.net/</uri>), and the joint analysis strategy by combining the three softwares (the three-caller pipeline approach), were taken into consideration when identifying somatic mutations.</p>
</sec>
<sec>
<title>Variant annotation</title>
<p>Oncotator (<uri xlink:href="http://portals.broadinstitute.org/oncotator/">http://portals.broadinstitute.org/oncotator/</uri>) was used to annotate the screened variations (<xref rid="b23-mmr-15-05-2489" ref-type="bibr">23</xref>). All of the candidate mutations were validated visually using the Integrated Genomics Viewer (IGV) (<xref rid="b24-mmr-15-05-2489" ref-type="bibr">24</xref>) and were confirmed using Sanger sequencing in paired samples. The tools, Polyphen-2 (<uri xlink:href="http://www.genetics.bwh.harvard.edu/pph2/index.shtml">www.genetics.bwh.harvard.edu/pph2/index.shtml</uri>) and scale-invariant feature transform (SIFT; <uri xlink:href="http://www.sift.jcvi.org/">www.sift.jcvi.org/</uri>), were integrated to predict whether mutations affected protein function based on the structure and function of the protein, and the conservation of amino acid residues in different species sequences.</p>
</sec>
<sec>
<title>Gene functional enrichment analysis</title>
<p>The gene sets screened were used for functional annotation analysis by the Database for Annotation, Visualization and Integrate Discovery software (<xref rid="b25-mmr-15-05-2489" ref-type="bibr">25</xref>), which consists of the Kyoto Encyclopedia of Genes and Genomes and Gene Ontology database. The significance of gene groups enrichment was defined by a modified Fisher&#x0027;s exact test and P&#x003C;0.05 was considered to indicate a statistically significant difference.</p>
</sec>
</sec>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title/>
<sec>
<title>Establishment of three-caller and HCC data analysis</title>
<p>WES was analyzed in one HCC tumor and paired adjacent tissues with the three-caller approach. The present study acquired 96.30X and 79.18X coverages for the tumor and paired adjacent tissues, respectively, in all of the targeted exonic regions, with 93.4&#x0025; of the base targeted at 20-fold and &#x2265;99.1&#x0025; bases by a depth of at least two times. To identify the somatic mutations, a flow chart was created with the following steps: i) Quality evaluation of the raw reads; ii) reads map to a reference genome; iii) somatic mutation identification with the three-caller approach; iv) variant annotation; v) data visualization; and vi) pathway analysis (<xref rid="f1-mmr-15-05-2489" ref-type="fig">Fig. 1</xref>).</p>
</sec>
<sec>
<title>Detecting SNVs in a HCC sample</title>
<p>Variant filtering was performed by GATK with the following filter parameters: Low coverage (DP &#x003C;5), low quality (QUAL &#x003E;30.0 and QUAL &#x003C;5.0), very low quality (QUAL &#x003C;30), hard to validate [MQ0 &#x2265;4 and MQ0/(1.0&#x002A;DP)&#x003E;0.1)] and quality-by-depth (QD &#x003C;1.5). The exome data from the samples were calculated by running these parameters and reserved in a VCF file. GATK was primarily used for identifying somatic mutations in the sequencing data, including SNVs and indels.</p>
<p>In order to identify the low allelic-fraction mutations, MuTect was used to generate more performance in low coverage (<xref rid="b12-mmr-15-05-2489" ref-type="bibr">12</xref>). To illustrate how high the sensitivity was based on allele fraction and sequencing depth, a strategy was established based on the published data to analyze the data (<xref rid="b14-mmr-15-05-2489" ref-type="bibr">14</xref>). As shown in <xref rid="f2-mmr-15-05-2489" ref-type="fig">Fig. 2</xref>, the sensitivity of mutation was detected by MuTect approaching &#x003E;90&#x0025; at allele frequency 10&#x0025; with &#x003E;80X sequencing depth and 80&#x0025; at allele frequency 5&#x0025; with &#x003E;80X.</p>
<p>The calling of SNVs by MuTect software was executed through Java (version 1.6.0_45; <uri xlink:href="http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase6-419409.html">www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase6-419409.html</uri>). The default parameters of MuTect were kept to identify mutations. Input database texts, including reference sequence hg19, dbsnp v.135 and cosmic v54, were used for the MuTect algorithm. Somatic point mutations were only identified by MuTect; GATK (version 1.5) was used to analyze indels. SNVs located in exome regions were screened with &#x2265;20 coverage in the tumor, which was coupled with &#x2265;4 alternate alleles and &#x2265;4 allelic fraction of the altered base. The paired normal sample also had 10X coverage at least in a certain base. As many low coverage or low allelic fraction SNVs were characterized by MuTect, SNVs with variants from low purity samples not blindly rejected.</p>
<p>VarScan outperformed the other tools at higher allelic fraction. A threshold of 6X for tumor and 8X for normal was set, with &#x2265;20&#x0025; variation frequency. Subsequently, the present study preferentially analyzed 20X coverage in the tumor, including alternated variation accounting for 10X coverage, to eliminate false positives.</p>
<p>The present study proposed 75 candidate somatic variants through the three-algorithm strategy (<xref rid="f3-mmr-15-05-2489" ref-type="fig">Fig. 3</xref>), including 50 nonsynonymous mutations, 2 nonsense mutations, 20 synonymous mutations and 3 indels. The nonsynonymous to synonymous somatic SNV ratio was 2.5.</p>
</sec>
<sec>
<title>Analysis of somatic mutations</title>
<p>The predictive impact of amino acid substitution on functional evidence was analyzed using PolyPhen-2/SIFT (<xref rid="tI-mmr-15-05-2489" ref-type="table">Table I</xref>). The P94Q mutation was predicted to affect the protein function of cell division cycle 7 protein, which may be associated with neoplastic transformation of some tumors and affect protein serine/threonine kinase activity. All of the putative somatic mutations were validated manually using IGV. The T&#x003E;C transversion at position_9056725 in mucin 16 (MUC16) was identified (<xref rid="f4-mmr-15-05-2489" ref-type="fig">Fig. 4</xref>), which was then validated by Sanger sequencing.</p>
</sec>
<sec>
<title>Pathway analysis</title>
<p>The 75 genes with tumor-specific mutations demonstrated significant functional enrichment of cell adhesion and regulation of Ras GTPase activity (P&#x003C;0.05; <xref rid="tII-mmr-15-05-2489" ref-type="table">Table II</xref>). Notably, the genes encoding cell adhesion demonstrated the most prevalent enrichment (P=0.0089), indicating that the enriched mutations of cell adhesion genes may serve pivotal roles in HCC development.</p>
</sec>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>WES technologies have provided extensive profiles of genomic mutations in cancers, however, how to process the generated dataset effectively for downstream analyses, remains a problem. Currently the accuracy of variant calling is still influenced by a number of factors. Firstly, low specificity and sensitivity of the existing high-throughput sequencing may prevent the generation of accurate mutation profiles (<xref rid="b26-mmr-15-05-2489" ref-type="bibr">26</xref>). Secondly, the BWA algorithms may produce incorrect base alignment. Finally, the three algorithm tools, MuTect, VarScan and GATK, used for identifying variants, present their respective limitations. GATK is a semi-automated algorithm that calculates somatic variants. VarScan identifies the most high-quality SNVs preferentially, while MuTect outperforms in low-quality ones. Some true SNVs are hard to differentiate due to a number of factors including clonal heterogeneity, strand bias, low allele frequencies, tumor contamination, high GC content of genomic regions, sequencing errors and non-specificities in short read mapping (<xref rid="b12-mmr-15-05-2489" ref-type="bibr">12</xref>).</p>
<p>Comparisons between SNVs calls analyzed with GATK, MuTect and VarScan, revealed that only a few of the SNVs were called by more than one of the tools (<xref rid="f3-mmr-15-05-2489" ref-type="fig">Fig. 3</xref>), thus it was difficult to select candidate SNVs for further validation. The disagreement was partially due to prior assumptions underlying each algorithm and different error models. Therefore, further development of more significant and accurate calling algorithms was required (<xref rid="b27-mmr-15-05-2489" ref-type="bibr">27</xref>), however, combining MuTect/GATK with VarScan produced more accurate SNVs. In light of these limitations in genomic studies, the three-caller strategy was designed to obtain accurate mutation information for clinical assessment.</p>
<p>The present study integrated different software programs to form a modular pipeline for processing somatic SNVs and indels. A series of software was used to perform data alignment, data filtering, reducing duplicate and realignment, as well as recalibrating through java. In the study of HCC, WES analysis started with the acquisition of raw data to select several candidate genes, which alluded to the potential effect of cancer-associated somatic mutations on tumor progression. The mutation set-based analysis revealed a number of potential somatic events in HCC, including in CUB and sushi multiple domains 1, FRAS1-related extracellular matrix 1 and MUC16 genes. The mutations at different base positions of the same gene or different genes may lead to disparate functions such as activation and inactivation mutations. This may influence their physicochemical properties and structure in comparison with wild-type proteins. Functional enrichment analysis revealed the biological process enrichment of cancer-specific mutations, including cell adhesion and regulation of Ras GTPase activity. Experiments are required to validate the variants which may affect interactions with other proteins and disorder crucial signaling pathways (<xref rid="b28-mmr-15-05-2489" ref-type="bibr">28</xref>).</p>
<p>In conclusion, the pipeline for HCC exome sequencing data analysis demonstrated in the present study provided a convenient strategy to identify the potentially functional tumor-specific mutations, which may support our understanding of the underlying mechanisms of HCC development.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>The present study was supported by the National Natural Science Foundation of China (grant no. 31571434), the National High Technology Research and Development Program of China (grant no. 2012AA02A205) and the National Basic Research Program of China (grant no. 2015CB553701).</p>
</ack>
<ref-list>
<title>References</title>
<ref id="b1-mmr-15-05-2489"><label>1</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lengauer</surname><given-names>C</given-names></name><name><surname>Kinzler</surname><given-names>KW</given-names></name><name><surname>Vogelstein</surname><given-names>B</given-names></name></person-group><article-title>Genetic instabilities in human cancers</article-title><source>Nature</source><volume>396</volume><fpage>643</fpage><lpage>649</lpage><year>1998</year><pub-id pub-id-type="doi">10.1038/25292</pub-id><pub-id pub-id-type="pmid">9872311</pub-id></element-citation></ref>
<ref id="b2-mmr-15-05-2489"><label>2</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bass</surname><given-names>AJ</given-names></name><name><surname>Lawrence</surname><given-names>MS</given-names></name><name><surname>Brace</surname><given-names>LE</given-names></name><name><surname>Ramos</surname><given-names>AH</given-names></name><name><surname>Drier</surname><given-names>Y</given-names></name><name><surname>Cibulskis</surname><given-names>K</given-names></name><name><surname>Sougnez</surname><given-names>C</given-names></name><name><surname>Voet</surname><given-names>D</given-names></name><name><surname>Saksena</surname><given-names>G</given-names></name><name><surname>Sivachenko</surname><given-names>A</given-names></name><etal/></person-group><article-title>Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion</article-title><source>Nat Genet</source><volume>43</volume><fpage>964</fpage><lpage>968</lpage><year>2011</year><pub-id pub-id-type="doi">10.1038/ng.936</pub-id><pub-id pub-id-type="pmid">21892161</pub-id><pub-id pub-id-type="pmcid">3802528</pub-id></element-citation></ref>
<ref id="b3-mmr-15-05-2489"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chapman</surname><given-names>MA</given-names></name><name><surname>Lawrence</surname><given-names>MS</given-names></name><name><surname>Keats</surname><given-names>JJ</given-names></name><name><surname>Cibulskis</surname><given-names>K</given-names></name><name><surname>Sougnez</surname><given-names>C</given-names></name><name><surname>Schinzel</surname><given-names>AC</given-names></name><name><surname>Harview</surname><given-names>CL</given-names></name><name><surname>Brunet</surname><given-names>JP</given-names></name><name><surname>Ahmann</surname><given-names>GJ</given-names></name><name><surname>Adli</surname><given-names>M</given-names></name><etal/></person-group><article-title>Initial genome sequencing and analysis of multiple myeloma</article-title><source>Nature</source><volume>471</volume><fpage>467</fpage><lpage>472</lpage><year>2011</year><pub-id pub-id-type="doi">10.1038/nature09837</pub-id><pub-id pub-id-type="pmid">21430775</pub-id><pub-id pub-id-type="pmcid">3560292</pub-id></element-citation></ref>
<ref id="b4-mmr-15-05-2489"><label>4</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jia</surname><given-names>D</given-names></name><name><surname>Dong</surname><given-names>R</given-names></name><name><surname>Jing</surname><given-names>Y</given-names></name><name><surname>Xu</surname><given-names>D</given-names></name><name><surname>Wang</surname><given-names>Q</given-names></name><name><surname>Chen</surname><given-names>L</given-names></name><name><surname>Li</surname><given-names>Q</given-names></name><name><surname>Huang</surname><given-names>Y</given-names></name><name><surname>Zhang</surname><given-names>Y</given-names></name><name><surname>Zhang</surname><given-names>Z</given-names></name><etal/></person-group><article-title>Exome sequencing of hepatoblastoma reveals novel mutations and cancer genes in the Wnt pathway and ubiquitin ligase complex</article-title><source>Hepatology</source><volume>60</volume><fpage>1686</fpage><lpage>1696</lpage><year>2014</year><pub-id pub-id-type="doi">10.1002/hep.27243</pub-id><pub-id pub-id-type="pmid">24912477</pub-id></element-citation></ref>
<ref id="b5-mmr-15-05-2489"><label>5</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Biesecker</surname><given-names>LG</given-names></name><name><surname>Green</surname><given-names>RC</given-names></name></person-group><article-title>Diagnostic clinical genome and exome sequencing</article-title><source>N Engl J Med</source><volume>371</volume><fpage>1170</fpage><year>2014</year><pub-id pub-id-type="pmid">25229935</pub-id></element-citation></ref>
<ref id="b6-mmr-15-05-2489"><label>6</label><element-citation publication-type="journal"><comment>Cancer Genome Atlas Research Network</comment><person-group person-group-type="author"><name><surname>Hammerman</surname><given-names>PS</given-names></name><name><surname>Lawrence</surname><given-names>MS</given-names></name><name><surname>Voet</surname><given-names>D</given-names></name><name><surname>Jing</surname><given-names>R</given-names></name><name><surname>Cibulskis</surname><given-names>K</given-names></name><name><surname>Sivachenko</surname><given-names>A</given-names></name><name><surname>Stojanov</surname><given-names>P</given-names></name><name><surname>McKenna</surname><given-names>A</given-names></name><name><surname>Lander</surname><given-names>ES</given-names></name><etal/></person-group><article-title>Comprehensive genomic characterization of squamous cell lung cancers</article-title><source>Nature</source><volume>489</volume><fpage>519</fpage><lpage>525</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/nature11404</pub-id><pub-id pub-id-type="pmid">22960745</pub-id><pub-id pub-id-type="pmcid">3466113</pub-id></element-citation></ref>
<ref id="b7-mmr-15-05-2489"><label>7</label><element-citation publication-type="journal"><comment>Cancer Genome Atlas Network</comment><person-group person-group-type="author"><name><surname>Muzny</surname><given-names>DM</given-names></name><name><surname>Bainbridge</surname><given-names>MN</given-names></name><name><surname>Chang</surname><given-names>K</given-names></name><name><surname>Dinh</surname><given-names>HH</given-names></name><name><surname>Drummond</surname><given-names>JA</given-names></name><name><surname>Fowler</surname><given-names>G</given-names></name><name><surname>Kovar</surname><given-names>CL</given-names></name><name><surname>Lewis</surname><given-names>LR</given-names></name><name><surname>Morgan</surname><given-names>MB</given-names></name><etal/></person-group><article-title>Comprehensive molecular characterization of human colon and rectal cancer</article-title><source>Nature</source><volume>487</volume><fpage>330</fpage><lpage>337</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/nature11252</pub-id><pub-id pub-id-type="pmid">22810696</pub-id><pub-id pub-id-type="pmcid">3401966</pub-id></element-citation></ref>
<ref id="b8-mmr-15-05-2489"><label>8</label><element-citation publication-type="journal"><comment>Cancer Genome Atlas Network</comment><person-group person-group-type="author"><name><surname>Koboldt</surname><given-names>DC</given-names></name><name><surname>Fulton</surname><given-names>RS</given-names></name><name><surname>McLellan</surname><given-names>MD</given-names></name><name><surname>Schmidt</surname><given-names>H</given-names></name><name><surname>Kalicki-Veizer</surname><given-names>J</given-names></name><name><surname>McMichael</surname><given-names>JF</given-names></name><name><surname>Fulton</surname><given-names>LL</given-names></name><name><surname>Dooling</surname><given-names>DJ</given-names></name><name><surname>Ding</surname><given-names>L</given-names></name><etal/></person-group><article-title>Comprehensive molecular portraits of human breast tumours</article-title><source>Nature</source><volume>490</volume><fpage>61</fpage><lpage>70</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/nature11412</pub-id><pub-id pub-id-type="pmid">23000897</pub-id><pub-id pub-id-type="pmcid">3465532</pub-id></element-citation></ref>
<ref id="b9-mmr-15-05-2489"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Litchfield</surname><given-names>K</given-names></name><name><surname>Summersgill</surname><given-names>B</given-names></name><name><surname>Yost</surname><given-names>S</given-names></name><name><surname>Sultana</surname><given-names>R</given-names></name><name><surname>Labreche</surname><given-names>K</given-names></name><name><surname>Dudakia</surname><given-names>D</given-names></name><name><surname>Renwick</surname><given-names>A</given-names></name><name><surname>Seal</surname><given-names>S</given-names></name><name><surname>Al-Saadi</surname><given-names>R</given-names></name><name><surname>Broderick</surname><given-names>P</given-names></name><etal/></person-group><article-title>Whole-exome sequencing reveals the mutational spectrum of testicular germ cell tumours</article-title><source>Nat Commun</source><volume>6</volume><fpage>5973</fpage><year>2015</year><pub-id pub-id-type="doi">10.1038/ncomms6973</pub-id><pub-id pub-id-type="pmid">25609015</pub-id><pub-id pub-id-type="pmcid">4338546</pub-id></element-citation></ref>
<ref id="b10-mmr-15-05-2489"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>Z</given-names></name></person-group><article-title>Genomic landscape of liver cancer</article-title><source>Nat Genet</source><volume>44</volume><fpage>1075</fpage><lpage>1077</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/ng.2412</pub-id><pub-id pub-id-type="pmid">23011223</pub-id></element-citation></ref>
<ref id="b11-mmr-15-05-2489"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kan</surname><given-names>Z</given-names></name><name><surname>Zheng</surname><given-names>H</given-names></name><name><surname>Liu</surname><given-names>X</given-names></name><name><surname>Li</surname><given-names>S</given-names></name><name><surname>Barber</surname><given-names>TD</given-names></name><name><surname>Gong</surname><given-names>Z</given-names></name><name><surname>Gao</surname><given-names>H</given-names></name><name><surname>Hao</surname><given-names>K</given-names></name><name><surname>Willard</surname><given-names>MD</given-names></name><name><surname>Xu</surname><given-names>J</given-names></name><etal/></person-group><article-title>Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma</article-title><source>Genome Res</source><volume>23</volume><fpage>1422</fpage><lpage>1433</lpage><year>2013</year><pub-id pub-id-type="doi">10.1101/gr.154492.113</pub-id><pub-id pub-id-type="pmid">23788652</pub-id><pub-id pub-id-type="pmcid">3759719</pub-id></element-citation></ref>
<ref id="b12-mmr-15-05-2489"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>Q</given-names></name><name><surname>Jia</surname><given-names>P</given-names></name><name><surname>Li</surname><given-names>F</given-names></name><name><surname>Chen</surname><given-names>H</given-names></name><name><surname>Ji</surname><given-names>H</given-names></name><name><surname>Hucks</surname><given-names>D</given-names></name><name><surname>Dahlman</surname><given-names>KB</given-names></name><name><surname>Pao</surname><given-names>W</given-names></name><name><surname>Zhao</surname><given-names>Z</given-names></name></person-group><article-title>Detecting somatic point mutations in cancer genome sequencing data: A comparison of mutation callers</article-title><source>Genome Med</source><volume>5</volume><fpage>91</fpage><year>2013</year><pub-id pub-id-type="doi">10.1186/gm495</pub-id><pub-id pub-id-type="pmid">24112718</pub-id><pub-id pub-id-type="pmcid">3971343</pub-id></element-citation></ref>
<ref id="b13-mmr-15-05-2489"><label>13</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gerlinger</surname><given-names>M</given-names></name><name><surname>Rowan</surname><given-names>AJ</given-names></name><name><surname>Horswell</surname><given-names>S</given-names></name><name><surname>Larkin</surname><given-names>J</given-names></name><name><surname>Endesfelder</surname><given-names>D</given-names></name><name><surname>Gronroos</surname><given-names>E</given-names></name><name><surname>Martinez</surname><given-names>P</given-names></name><name><surname>Matthews</surname><given-names>N</given-names></name><name><surname>Stewart</surname><given-names>A</given-names></name><name><surname>Tarpey</surname><given-names>P</given-names></name><etal/></person-group><article-title>Intratumor heterogeneity and branched evolution revealed by multiregion sequencing</article-title><source>N Engl J Med</source><volume>366</volume><fpage>883</fpage><lpage>892</lpage><year>2012</year><pub-id pub-id-type="doi">10.1056/NEJMoa1113205</pub-id><pub-id pub-id-type="pmid">22397650</pub-id><pub-id pub-id-type="pmcid">4878653</pub-id></element-citation></ref>
<ref id="b14-mmr-15-05-2489"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cibulskis</surname><given-names>K</given-names></name><name><surname>Lawrence</surname><given-names>MS</given-names></name><name><surname>Carter</surname><given-names>SL</given-names></name><name><surname>Sivachenko</surname><given-names>A</given-names></name><name><surname>Jaffe</surname><given-names>D</given-names></name><name><surname>Sougnez</surname><given-names>C</given-names></name><name><surname>Gabriel</surname><given-names>S</given-names></name><name><surname>Meyerson</surname><given-names>M</given-names></name><name><surname>Lander</surname><given-names>ES</given-names></name><name><surname>Getz</surname><given-names>G</given-names></name></person-group><article-title>Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples</article-title><source>Nat Biotechnol</source><volume>31</volume><fpage>213</fpage><lpage>219</lpage><year>2013</year><pub-id pub-id-type="doi">10.1038/nbt.2514</pub-id><pub-id pub-id-type="pmid">23396013</pub-id><pub-id pub-id-type="pmcid">3833702</pub-id></element-citation></ref>
<ref id="b15-mmr-15-05-2489"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koboldt</surname><given-names>DC</given-names></name><name><surname>Zhang</surname><given-names>Q</given-names></name><name><surname>Larson</surname><given-names>DE</given-names></name><name><surname>Shen</surname><given-names>D</given-names></name><name><surname>McLellan</surname><given-names>MD</given-names></name><name><surname>Lin</surname><given-names>L</given-names></name><name><surname>Miller</surname><given-names>CA</given-names></name><name><surname>Mardis</surname><given-names>ER</given-names></name><name><surname>Ding</surname><given-names>L</given-names></name><name><surname>Wilson</surname><given-names>RK</given-names></name></person-group><article-title>Varscan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing</article-title><source>Genome Res</source><volume>22</volume><fpage>568</fpage><lpage>576</lpage><year>2012</year><pub-id pub-id-type="doi">10.1101/gr.129684.111</pub-id><pub-id pub-id-type="pmid">22300766</pub-id><pub-id pub-id-type="pmcid">3290792</pub-id></element-citation></ref>
<ref id="b16-mmr-15-05-2489"><label>16</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McKenna</surname><given-names>A</given-names></name><name><surname>Hanna</surname><given-names>M</given-names></name><name><surname>Banks</surname><given-names>E</given-names></name><name><surname>Sivachenko</surname><given-names>A</given-names></name><name><surname>Cibulskis</surname><given-names>K</given-names></name><name><surname>Kernytsky</surname><given-names>A</given-names></name><name><surname>Garimella</surname><given-names>K</given-names></name><name><surname>Altshuler</surname><given-names>D</given-names></name><name><surname>Gabriel</surname><given-names>S</given-names></name><name><surname>Daly</surname><given-names>M</given-names></name><name><surname>DePristo</surname><given-names>MA</given-names></name></person-group><article-title>The genome analysis toolkit: A mapreduce framework for analyzing next-generation DNA sequencing data</article-title><source>Genome Res</source><volume>20</volume><fpage>1297</fpage><lpage>1303</lpage><year>2010</year><pub-id pub-id-type="doi">10.1101/gr.107524.110</pub-id><pub-id pub-id-type="pmid">20644199</pub-id><pub-id pub-id-type="pmcid">2928508</pub-id></element-citation></ref>
<ref id="b17-mmr-15-05-2489"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mardis</surname><given-names>ER</given-names></name></person-group><article-title>Next-generation DNA sequencing methods</article-title><source>Annu Rev Genomics Hum Genet</source><volume>9</volume><fpage>387</fpage><lpage>402</lpage><year>2008</year><pub-id pub-id-type="doi">10.1146/annurev.genom.9.081307.164359</pub-id><pub-id pub-id-type="pmid">18576944</pub-id></element-citation></ref>
<ref id="b18-mmr-15-05-2489"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Metzker</surname><given-names>ML</given-names></name></person-group><article-title>Sequencing technologies-the next generation</article-title><source>Nat Rev Genet</source><volume>11</volume><fpage>31</fpage><lpage>46</lpage><year>2010</year><pub-id pub-id-type="doi">10.1038/nrg2626</pub-id><pub-id pub-id-type="pmid">19997069</pub-id></element-citation></ref>
<ref id="b19-mmr-15-05-2489"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dohm</surname><given-names>JC</given-names></name><name><surname>Lottaz</surname><given-names>C</given-names></name><name><surname>Borodina</surname><given-names>T</given-names></name><name><surname>Himmelbauer</surname><given-names>H</given-names></name></person-group><article-title>Substantial biases inultra-short read data sets from high-throughput DNA sequencing</article-title><source>Nucleic Acids Res</source><volume>36</volume><fpage>e105</fpage><year>2008</year><pub-id pub-id-type="doi">10.1093/nar/gkn425</pub-id><pub-id pub-id-type="pmid">18660515</pub-id><pub-id pub-id-type="pmcid">2532726</pub-id></element-citation></ref>
<ref id="b20-mmr-15-05-2489"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nielsen</surname><given-names>R</given-names></name><name><surname>Paul</surname><given-names>JS</given-names></name><name><surname>Albrechtsen</surname><given-names>A</given-names></name><name><surname>Song</surname><given-names>YS</given-names></name></person-group><article-title>Genotype and SNP calling from next-generation sequencing data</article-title><source>Nat Rev Genet</source><volume>12</volume><fpage>443</fpage><lpage>451</lpage><year>2011</year><pub-id pub-id-type="doi">10.1038/nrg2986</pub-id><pub-id pub-id-type="pmid">21587300</pub-id><pub-id pub-id-type="pmcid">3593722</pub-id></element-citation></ref>
<ref id="b21-mmr-15-05-2489"><label>21</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Li</surname><given-names>H</given-names></name><name><surname>Durbin</surname><given-names>R</given-names></name></person-group><article-title>Fast and accurate long-read alignment with burrows-wheeler transform</article-title><source>Bioinformatics</source><volume>26</volume><fpage>589</fpage><lpage>595</lpage><year>2010</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btp698</pub-id><pub-id pub-id-type="pmid">20080505</pub-id><pub-id pub-id-type="pmcid">2828108</pub-id></element-citation></ref>
<ref id="b22-mmr-15-05-2489"><label>22</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Li</surname><given-names>H</given-names></name><name><surname>Handsaker</surname><given-names>B</given-names></name><name><surname>Wysoker</surname><given-names>A</given-names></name><name><surname>Fennell</surname><given-names>T</given-names></name><name><surname>Ruan</surname><given-names>J</given-names></name><name><surname>Homer</surname><given-names>N</given-names></name><name><surname>Marth</surname><given-names>G</given-names></name><name><surname>Abecasis</surname><given-names>G</given-names></name><name><surname>Durbin</surname><given-names>R</given-names></name></person-group><article-title>1000 Genome Project Data Processing Subgroup: The sequence alignment/map format and SAMtools</article-title><source>Bioinformatics</source><volume>25</volume><fpage>2078</fpage><lpage>2079</lpage><year>2009</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id><pub-id pub-id-type="pmid">19505943</pub-id><pub-id pub-id-type="pmcid">2723002</pub-id></element-citation></ref>
<ref id="b23-mmr-15-05-2489"><label>23</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ramos</surname><given-names>AH</given-names></name><name><surname>Lichtenstein</surname><given-names>L</given-names></name><name><surname>Gupta</surname><given-names>M</given-names></name><name><surname>Lawrence</surname><given-names>MS</given-names></name><name><surname>Pugh</surname><given-names>TJ</given-names></name><name><surname>Saksena</surname><given-names>G</given-names></name><name><surname>Meyerson</surname><given-names>M</given-names></name><name><surname>Getz</surname><given-names>G</given-names></name></person-group><article-title>Oncotator: Cancer variant annotation tool</article-title><source>Hum Mutat</source><volume>36</volume><fpage>E2423</fpage><lpage>E2429</lpage><year>2015</year><pub-id pub-id-type="doi">10.1002/humu.22771</pub-id><pub-id pub-id-type="pmid">25703262</pub-id></element-citation></ref>
<ref id="b24-mmr-15-05-2489"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Robinson</surname><given-names>JT</given-names></name><name><surname>Thorvaldsd&#x00F3;ttir</surname><given-names>H</given-names></name><name><surname>Winckler</surname><given-names>W</given-names></name><name><surname>Guttman</surname><given-names>M</given-names></name><name><surname>Lander</surname><given-names>ES</given-names></name><name><surname>Getz</surname><given-names>G</given-names></name><name><surname>Mesirov</surname><given-names>JP</given-names></name></person-group><article-title>Integrative genomics viewer</article-title><source>Nat Biotechnol</source><volume>29</volume><fpage>24</fpage><lpage>26</lpage><year>2011</year><pub-id pub-id-type="doi">10.1038/nbt.1754</pub-id><pub-id pub-id-type="pmid">21221095</pub-id><pub-id pub-id-type="pmcid">3346182</pub-id></element-citation></ref>
<ref id="b25-mmr-15-05-2489"><label>25</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>da Huang</surname><given-names>W</given-names></name><name><surname>Sherman</surname><given-names>BT</given-names></name><name><surname>Lempicki</surname><given-names>RA</given-names></name></person-group><article-title>Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources</article-title><source>Nat Protoc</source><volume>4</volume><fpage>44</fpage><lpage>57</lpage><year>2009</year><pub-id pub-id-type="doi">10.1038/nprot.2008.211</pub-id><pub-id pub-id-type="pmid">19131956</pub-id></element-citation></ref>
<ref id="b26-mmr-15-05-2489"><label>26</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Totoki</surname><given-names>Y</given-names></name><name><surname>Tatsuno</surname><given-names>K</given-names></name><name><surname>Yamamoto</surname><given-names>S</given-names></name><name><surname>Arai</surname><given-names>Y</given-names></name><name><surname>Hosoda</surname><given-names>F</given-names></name><name><surname>Ishikawa</surname><given-names>S</given-names></name><name><surname>Tsutsumi</surname><given-names>S</given-names></name><name><surname>Sonoda</surname><given-names>K</given-names></name><name><surname>Totsuka</surname><given-names>H</given-names></name><name><surname>Shirakihara</surname><given-names>T</given-names></name><etal/></person-group><article-title>High-resolution characterization of a hepatocellular carcinoma genome</article-title><source>Nat Genet</source><volume>43</volume><fpage>464</fpage><lpage>469</lpage><year>2011</year><pub-id pub-id-type="doi">10.1038/ng.804</pub-id><pub-id pub-id-type="pmid">21499249</pub-id></element-citation></ref>
<ref id="b27-mmr-15-05-2489"><label>27</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lawrence</surname><given-names>MS</given-names></name><name><surname>Stojanov</surname><given-names>P</given-names></name><name><surname>Polak</surname><given-names>P</given-names></name><name><surname>Kryukov</surname><given-names>GV</given-names></name><name><surname>Cibulskis</surname><given-names>K</given-names></name><name><surname>Sivachenko</surname><given-names>A</given-names></name><name><surname>Carter</surname><given-names>SL</given-names></name><name><surname>Stewart</surname><given-names>C</given-names></name><name><surname>Mermel</surname><given-names>CH</given-names></name><name><surname>Roberts</surname><given-names>SA</given-names></name><etal/></person-group><article-title>Mutational heterogeneity in cancer and the search for new cancer-associated genes</article-title><source>Nature</source><volume>499</volume><fpage>214</fpage><lpage>218</lpage><year>2013</year><pub-id pub-id-type="doi">10.1038/nature12213</pub-id><pub-id pub-id-type="pmid">23770567</pub-id><pub-id pub-id-type="pmcid">3919509</pub-id></element-citation></ref>
<ref id="b28-mmr-15-05-2489"><label>28</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kwon</surname><given-names>SM</given-names></name><name><surname>Cho</surname><given-names>H</given-names></name><name><surname>Choi</surname><given-names>JH</given-names></name><name><surname>Jee</surname><given-names>BA</given-names></name><name><surname>Jo</surname><given-names>Y</given-names></name><name><surname>Woo</surname><given-names>HG</given-names></name></person-group><article-title>Perspectives of integrative cancer genomics in next generation sequencing era</article-title><source>Genomics Inform</source><volume>10</volume><fpage>69</fpage><lpage>73</lpage><year>2012</year><pub-id pub-id-type="doi">10.5808/GI.2012.10.2.69</pub-id><pub-id pub-id-type="pmid">23105932</pub-id><pub-id pub-id-type="pmcid">3480879</pub-id></element-citation></ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-mmr-15-05-2489" position="float">
<label>Figure 1.</label>
<caption><p>Flowchart depicting the process applied for the identification of somatic mutations based on the Illumina sequencing data. Following library preparation, samples were sequenced on the His-seq2,000 Illumina platform. The next steps were designed to assess quality and align the reads against the hg19 reference genome, which was followed by variant calling with the three-caller strategy. Identified somatic mutations were annotated to explain biological functions and the occurrence of disease. BWA, Burrows-Wheeler Aligner; GATK, Genome Analysis Toolkit.</p></caption>
<graphic xlink:href="MMR-15-05-2489-g00.jpg"/>
</fig>
<fig id="f2-mmr-15-05-2489" position="float">
<label>Figure 2.</label>
<caption><p>Mutation sensitivity calculated by MuTect. A given allele frequency value and specific sequencing depth were used to calculate mutation sensitivity.</p></caption>
<graphic xlink:href="MMR-15-05-2489-g01.tif"/>
</fig>
<fig id="f3-mmr-15-05-2489" position="float">
<label>Figure 3.</label>
<caption><p>Identification of somatic variants. A number of somatic variants were detected using the three-caller strategy in a pair of hepatocellular carcinoma samples. The Venn diagram depicted the number of somatic variants identified by GATK, MuTect and VarScan. A total of 75 somatic variants were identified however, only 2 of the same variants were noted by more than one of the algorithms (GATK and Mutect; 2.7&#x0025; of identified variants). Therefore, a combination of the 3 algorithms was more effective. GATK, Genome Analysis Toolkit.</p></caption>
<graphic xlink:href="MMR-15-05-2489-g02.tif"/>
</fig>
<fig id="f4-mmr-15-05-2489" position="float">
<label>Figure 4.</label>
<caption><p>Identification of MUC16 variants in a pair of HCC samples. The figure depicts the exome sequencing projects of HCC tumor and paired adjacent tissues. The blue letter C indicates the presence of a non-reference allele, and thus a point mutation (T&#x003E;C) at position_9056725 in MUC16. MUC16, mucin 16; HCC, hepatocellular carcinoma.</p></caption>
<graphic xlink:href="MMR-15-05-2489-g03.tif"/>
</fig>
<table-wrap id="tI-mmr-15-05-2489" position="float">
<label>Table I.</label>
<caption><p>Selected somatic mutations predicted by Polyphen to affect protein function.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Hugo symbol</th>
<th align="center" valign="bottom">Amino acid change</th>
<th align="center" valign="bottom">SIFT</th>
<th align="center" valign="bottom">SIFT score</th>
<th align="center" valign="bottom">Polyphen</th>
<th align="center" valign="bottom">Polyphen score</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">CSMD1</td>
<td align="left" valign="top">Q2192R</td>
<td align="left" valign="top">Damaging</td>
<td align="center" valign="top">0.04</td>
<td align="left" valign="top">Probably damaging</td>
<td align="center" valign="top">0.973</td>
</tr>
<tr>
<td align="left" valign="top">FREM1</td>
<td align="left" valign="top">H822Q</td>
<td align="left" valign="top">Damaging</td>
<td align="center" valign="top">0.01</td>
<td align="left" valign="top">Probably damaging</td>
<td align="center" valign="top">0.972</td>
</tr>
<tr>
<td align="left" valign="top">GP5</td>
<td align="left" valign="top">I230N</td>
<td align="left" valign="top">Damaging</td>
<td align="center" valign="top">0</td>
<td align="left" valign="top">Probably damaging</td>
<td align="center" valign="top">0.997</td>
</tr>
<tr>
<td align="left" valign="top">KCNA1</td>
<td align="left" valign="top">E422K</td>
<td align="left" valign="top">Tolerated</td>
<td align="center" valign="top">0.06</td>
<td align="left" valign="top">Benign</td>
<td align="center" valign="top">0.013</td>
</tr>
<tr>
<td align="left" valign="top">CDC7</td>
<td align="left" valign="top">P94Q</td>
<td align="left" valign="top">Damaging</td>
<td align="center" valign="top">0</td>
<td align="left" valign="top">Probably damaging</td>
<td align="center" valign="top">1</td>
</tr>
<tr>
<td align="left" valign="top">DMBT1</td>
<td align="left" valign="top">R2343W</td>
<td align="left" valign="top">Damaging</td>
<td align="center" valign="top">0.02</td>
<td align="left" valign="top">Probably damaging</td>
<td align="center" valign="top">0.998</td>
</tr>
<tr>
<td align="left" valign="top">FAT2</td>
<td align="left" valign="top">V3602I</td>
<td align="left" valign="top">Tolerated</td>
<td align="center" valign="top">0.13</td>
<td align="left" valign="top">Benign</td>
<td align="center" valign="top">0.118</td>
</tr>
<tr>
<td align="left" valign="top">C10orf90</td>
<td align="left" valign="top">R188W</td>
<td align="left" valign="top">Tolerated</td>
<td align="center" valign="top">0.08</td>
<td align="left" valign="top">Benign</td>
<td align="center" valign="top">0.015</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn1-mmr-15-05-2489"><p>CSMD1, CUB and sushi multiple domains 1; FREM1, FRAS1-related extracellular matrix 1; GP5, glycoprotein V platelet; KCNA1, potassium voltage-gated channel subfamily A member 1; CDC7, cell division cycle 7; DMBT1, deleted in malignant brain tumors 1; FAT2, FAT atypical cadherin 2; C10orf90, chromosome 10 open reading frame 90; SIFT, scale-invariant feature transform.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="tII-mmr-15-05-2489" position="float">
<label>Table II.</label>
<caption><p>Functional categories of the tumor-specific mutation.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Biological process</th>
<th align="center" valign="bottom">Count</th>
<th align="center" valign="bottom">P-value</th>
<th align="center" valign="bottom">Genes</th>
<th align="center" valign="bottom">Fold enrichment</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Cell adhesion</td>
<td align="center" valign="top">8</td>
<td align="center" valign="top">0.0089</td>
<td align="left" valign="top">GP5, LGALS3BP, FREM1, FAT2, FCGBP, COL5A3, PCDHGB4, MUC16</td>
<td align="center" valign="top">3.29</td>
</tr>
<tr>
<td align="left" valign="top">Regulation of Ras GTPase activity</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">0.0487</td>
<td align="left" valign="top">TBC1D3, AGAP3, TBC1D3B, AGAP4</td>
<td align="center" valign="top">8.3</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn2-mmr-15-05-2489"><p>GP5, glycoprotein V platelet; LGALS3BP, galectin 3 binding protein; FREM1, FRAS1-related extracellular matrix 1; FAT2, FAT atypical cadherin 2; FCGBP, Fc fragment of IgG binding protein; COL5A3, collagen type V &#x03B1;3 chain; PCDHGB4, protocadherin &#x03B3; subfamily B, 4; MUC16, mucin 16; TBC1D3, TBC1 domain family member; AGAP, ArfGAP with GTPase domain, ankyrin repeat and PH domain.</p></fn>
</table-wrap-foot>
</table-wrap>
</floats-group>
</article>
