<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xml:lang="en" article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
<?release-delay 0|0?>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Molecular Medicine Reports</journal-id>
<journal-title-group>
<journal-title>Molecular Medicine Reports</journal-title>
</journal-title-group>
<issn pub-type="ppub">1791-2997</issn>
<issn pub-type="epub">1791-3004</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3892/mmr.2021.11890</article-id>
<article-id pub-id-type="publisher-id">MMR-0-0-11890</article-id>
<article-categories>
<subj-group>
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Benchmarking of next and third generation sequencing technologies and their associated algorithms for <italic>de novo</italic> genome assembly</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Gavrielatos</surname><given-names>Marios</given-names></name>
<xref rid="af1-mmr-0-0-11890" ref-type="aff">1</xref>
<xref rid="af2-mmr-0-0-11890" ref-type="aff">2</xref></contrib>
<contrib contrib-type="author"><name><surname>Kyriakidis</surname><given-names>Konstantinos</given-names></name>
<xref rid="af3-mmr-0-0-11890" ref-type="aff">3</xref>
<xref rid="af4-mmr-0-0-11890" ref-type="aff">4</xref></contrib>
<contrib contrib-type="author"><name><surname>Spandidos</surname><given-names>Demetrios A.</given-names></name>
<xref rid="af5-mmr-0-0-11890" ref-type="aff">5</xref></contrib>
<contrib contrib-type="author"><name><surname>Michalopoulos</surname><given-names>Ioannis</given-names></name>
<xref rid="af1-mmr-0-0-11890" ref-type="aff">1</xref>
<xref rid="c1-mmr-0-0-11890" ref-type="corresp"/></contrib>
</contrib-group>
<aff id="af1-mmr-0-0-11890"><label>1</label>Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece</aff>
<aff id="af2-mmr-0-0-11890"><label>2</label>Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, 15701 Athens, Greece</aff>
<aff id="af3-mmr-0-0-11890"><label>3</label>School of Pharmacy, Aristotle University of Thessaloniki (AUTh), 54124 Thessaloniki, Greece</aff>
<aff id="af4-mmr-0-0-11890"><label>4</label>Genomics and Epigenomics Translational Research (GENeTres), Centre for Interdisciplinary Research and Innovation, 57001 Thessaloniki, Greece</aff>
<aff id="af5-mmr-0-0-11890"><label>5</label>Laboratory of Clinical Virology, Medical School, University of Crete, 71003 Heraklion, Greece</aff>
<author-notes>
<corresp id="c1-mmr-0-0-11890"><italic>Correspondence to</italic>: Dr Ioannis Michalopoulos, Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 4 Soranou Efessiou, 11527 Athens, Greece, E-mail: <email>imichalop@bioacademy.gr</email></corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>04</month>
<year>2021</year></pub-date>
<pub-date pub-type="epub">
<day>02</day>
<month>02</month>
<year>2021</year></pub-date>
<volume>23</volume>
<issue>4</issue>
<elocation-id>251</elocation-id>
<history>
<date date-type="received"><day>04</day><month>11</month><year>2020</year></date>
<date date-type="accepted"><day>21</day><month>01</month><year>2021</year></date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; Gavrielatos et al.</copyright-statement>
<copyright-year>2021</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivs License</ext-link>, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.</license-p></license>
</permissions>
<abstract>
<p>Genome assemblers are computational tools for <italic>de novo</italic> genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel <italic>de novo</italic> genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired-end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long-read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi-C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi-C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources.</p>
</abstract>
<kwd-group>
<kwd><italic>de novo</italic> genome assembly</kwd>
<kwd>next generation sequencing</kwd>
<kwd>third generation sequencing</kwd>
<kwd>genomics</kwd>
<kwd>benchmarking</kwd>
<kwd>bioinformatics</kwd>
</kwd-group></article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>The first human genome draft (<xref rid="b1-mmr-0-0-11890" ref-type="bibr">1</xref>) was based on Sanger sequencing technology (<xref rid="b2-mmr-0-0-11890" ref-type="bibr">2</xref>), cost $2.7 billion and lasted over a period of 10 years (<xref rid="b3-mmr-0-0-11890" ref-type="bibr">3</xref>). In comparison, the sequencing of the human genome (~3 Gbp haploid genome size) in a next generation sequencing (NGS) platform where millions of reads are efficiently mapped to the reference genome, currently costs &#x003C;$1,000 and it can be performed in &#x003C;2 days (<xref rid="b4-mmr-0-0-11890" ref-type="bibr">4</xref>). Short-read <italic>de novo</italic> genome assemblers have difficulty to produce large and reliable contigs, particularly in low complexity regions such as centromeres, telomeres and other repetitive regions (<xref rid="b5-mmr-0-0-11890" ref-type="bibr">5</xref>,<xref rid="b6-mmr-0-0-11890" ref-type="bibr">6</xref>). To address this issue, third generation sequencing (<xref rid="b7-mmr-0-0-11890" ref-type="bibr">7</xref>) technologies have been developed. Nanopore (<uri xlink:href="https://nanoporetech.com/">https://nanoporetech.com/</uri>) (<xref rid="b8-mmr-0-0-11890" ref-type="bibr">8</xref>,<xref rid="b9-mmr-0-0-11890" ref-type="bibr">9</xref>) and PacBio (<uri xlink:href="https://www.pacb.com/">https://www.pacb.com/</uri>) (<xref rid="b10-mmr-0-0-11890" ref-type="bibr">10</xref>) sequencing platforms were launched around 2010. Third generation sequencers are sequencing single-molecules in real-time (<xref rid="b10-mmr-0-0-11890" ref-type="bibr">10</xref>) without the need of PCR amplification and thus, avoid PCR bias (<xref rid="b11-mmr-0-0-11890" ref-type="bibr">11</xref>,<xref rid="b12-mmr-0-0-11890" ref-type="bibr">12</xref>). The main drawback of long reads is lower accuracy compared to Illumina short-reads: Typical Nanopore and PacBio Sequel I long-reads have an average accuracy of 90&#x0025; (<xref rid="b13-mmr-0-0-11890" ref-type="bibr">13</xref>) compared to 99.9&#x0025; of typical Illumina short-reads (<xref rid="b4-mmr-0-0-11890" ref-type="bibr">4</xref>). As a consequence, assemblies produced only by long-reads were more contiguous, but they also contained more errors, which made genome annotation, variant calling and other genome analyses, challenging tasks (<xref rid="b6-mmr-0-0-11890" ref-type="bibr">6</xref>,<xref rid="b12-mmr-0-0-11890" ref-type="bibr">12</xref>).</p>
<p>By following the hybrid assembly strategy (<xref rid="b14-mmr-0-0-11890" ref-type="bibr">14</xref>,<xref rid="b15-mmr-0-0-11890" ref-type="bibr">15</xref>), the advantages of the two generations are combined, incorporating the information contained in the two read types, overcoming their drawbacks. Recent advantages in long-read sequencing by PacBio have shown very promising results: Sequel System II was released in 2019 with an upgraded SMRT flow cell that was first introduced in 2013 (<xref rid="b16-mmr-0-0-11890" ref-type="bibr">16</xref>), which was able to increase the sequencing yield up to 8-fold. However, the greatest breakthrough was the advance of circular consensus sequencing (CCS) (<xref rid="b17-mmr-0-0-11890" ref-type="bibr">17</xref>) which sequences the same circular DNA molecule 10 times, to produce a highly accurate (99.9&#x0025;) high-fidelity (HiFi) consensus read, while increasing unique molecular yield and insert size (up to 25 Kbp). At the same time, recent advances in Nanopore&#x0027;s base identification algorithm, Bonito (<uri xlink:href="https://github.com/nanoporetech/bonito">https://github.com/nanoporetech/bonito</uri>) (<xref rid="b18-mmr-0-0-11890" ref-type="bibr">18</xref>), have led to greater than 97&#x0025; base accuracy.</p>
<p>Usually, the primary genome assembly is very fragmented and some contigs are misassembled. For this reason, the completion of the assembly requires the construction of scaffolds (<xref rid="b19-mmr-0-0-11890" ref-type="bibr">19</xref>). To this end, Hi-C sequencing method provides chromosomal conformation information necessary to assemble chromosome-level scaffolds. The general principle of this method is based on the proximity and contacts of chromosomal regions in the cell nucleus. The frequency of contacts is higher between regions of the same chromosome; thus, different chromosomes can be distinguished during the assembly (<xref rid="b20-mmr-0-0-11890" ref-type="bibr">20</xref>). The result of this method is a collection of pairs of reads of chimeric fragments that can be mapped to the assembly, joining very remote areas.</p>
<p>Using the recent sequencing and scaffolding technologies, it is now possible to construct new reference genomes and finish the assembly of existing ones, by closing gaps in the centromeres, telomeres and other low complexity regions. For this reason, new projects have been launched and new consortia have been formed (<xref rid="b21-mmr-0-0-11890" ref-type="bibr">21</xref>&#x2013;<xref rid="b23-mmr-0-0-11890" ref-type="bibr">23</xref>). The telomere to telomere (T2T) consortium (<uri xlink:href="https://sites.google.com/ucsc.edu/t2tworkinggroup/">https://sites.google.com/ucsc.edu/t2tworkinggroup/</uri>) (<xref rid="b24-mmr-0-0-11890" ref-type="bibr">24</xref>,<xref rid="b25-mmr-0-0-11890" ref-type="bibr">25</xref>) aims to finish the entire human genome by producing chromosomes without gaps. Almost two decades after the first draft of the human genome by the International Human Genome Sequencing Consortium, T2T published a completed human genome with the exception of five known gaps withing the rDNA arrays (<uri xlink:href="https://genomeinformatics.github.io/CHM13v1/">https://genomeinformatics.github.io/CHM13v1/</uri>).</p>
<p>The development of sequencing technologies and assembly and scaffolding algorithms, as well as the sharp increase of publicly available data (<uri xlink:href="https://www.ncbi.nlm.nih.gov/genbank/statistics/">https://www.ncbi.nlm.nih.gov/genbank/statistics/</uri>), democratised <italic>de novo</italic> genome assembly projects by making them more approachable to smaller labs. The present study aimed to compare genome assembly pipelines, which use different assembly strategies, evaluating them in terms of accuracy, speed and computational power needed. Finally, the need for scaffold construction, incorporating Hi-C sequencing data was also evaluated.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title/>
<sec>
<title>Data acquisition and experimental overview</title>
<p>Primary sequencing data were downloaded from 3 organisms, <italic>Drosophila virilis, Drosophila melanogaster</italic> and <italic>Homo sapiens</italic> (<xref rid="tI-mmr-0-0-11890" ref-type="table">Table I</xref>). Some FASTQ files were subsampled using Reformat tool from BBtools (<uri xlink:href="https://sourceforge.net/projects/bbmap/">https://sourceforge.net/projects/bbmap/</uri>). Following the hybrid assembly strategy, using short paired-end Illumina reads in combination with long Nanopore reads, the low complexity genome of <italic>Drosophila virilis</italic> and the high complexity genome of <italic>Homo sapiens</italic> were constructed, downloading read data from the European Nucleotide Archive (ENA) (<xref rid="b26-mmr-0-0-11890" ref-type="bibr">26</xref>) and the T2T Consortium, respectively. <italic>Drosophila melanogaster</italic> genome was assembled following the long-read assembly strategy using only HiFi reads retrieved from ENA. Finally, Hi-C reads were used to create the scaffolds of our assemblies. It is important to note that the sequencing data used to assemble <italic>Homo sapiens</italic> genome, derives from CHM13hTERT, which is a female haploid cell line; thus, there will be no Y chromosome in the final assemblies. The experiments were performed on the Biomedical Research Foundation, Academy of Athens (BRFAA) computer cluster that consists of 24 nodes of 128 GB RAM each. Each node consists of 2 Intel<sup>&#x00AE;</sup> Xeon<sup>&#x00AE;</sup> Silver 4116 processors with 12 cores per processor and 2 threads per core (i.e. 48 CPUs per node). Additionally, <italic>Homo sapiens</italic> assembly by Wengan was performed on an Aristotle University of Thessaloniki (AUTh) computational system on a single node which consists of 4 AMD Opteron&#x2122; 6274 processors with 16 cores per processor and 1 thread per core (i.e., 64 CPUs) and 256 GB RAM.</p>
<p>The pipeline is divided into 3 parts: In the first stage of the current workflow (<xref rid="f1-mmr-0-0-11890" ref-type="fig">Fig. 1</xref>), different assemblers were used for the genome construction. In the second stage, the scaffolding, Hi-C data were combined with the initial assembly, in order to increase its continuity and accuracy. In the last stage, the final assembly was assessed and evaluated with the use of various tools.</p>
</sec>
<sec>
<title>Genome assembly</title>
<p>In order to assess the hybrid assembly strategy, the present study chose to evaluate two pipelines, MaSuRCA (version 3.3.5) (<xref rid="b27-mmr-0-0-11890" ref-type="bibr">27</xref>,<xref rid="b28-mmr-0-0-11890" ref-type="bibr">28</xref>) and Wengan (version 0.1) (<xref rid="b29-mmr-0-0-11890" ref-type="bibr">29</xref>). MaSuRCA workflow offers three different assemblers, CABOG (<xref rid="b30-mmr-0-0-11890" ref-type="bibr">30</xref>), SOAPdenovo (<xref rid="b31-mmr-0-0-11890" ref-type="bibr">31</xref>) and Flye (<xref rid="b32-mmr-0-0-11890" ref-type="bibr">32</xref>). The pipeline was tested using CABOG and Flye assemblers, which are designed for long-read assembly. Wengan pipeline is based on DiscovarDenovo assembler (<xref rid="b33-mmr-0-0-11890" ref-type="bibr">33</xref>).</p>
<p>Canu (version 2.0) (<xref rid="b34-mmr-0-0-11890" ref-type="bibr">34</xref>) is a long-read assembler, designed to use long high-noise single-molecule sequencing data, such as Nanopore and PacBio reads. Its workflow is based on the Celera assembler (<xref rid="b35-mmr-0-0-11890" ref-type="bibr">35</xref>) which was used in the Human Genome Project to produce the first draft of the human genome. Hifiasm (version 0.13) (<xref rid="b36-mmr-0-0-11890" ref-type="bibr">36</xref>) and HiCanu (Canu version 2.1.1) (<xref rid="b37-mmr-0-0-11890" ref-type="bibr">37</xref>) are long-read assemblers exclusively for HiFi reads. The main difference between HiFi assemblers and the ones mentioned previously, is that Hifiasm and HiCanu produce phased assemblies. A phased assembly is a haplotype-resolved assembly, where high complexity regions, such as genes, will be separated into two different alleles (<xref rid="b36-mmr-0-0-11890" ref-type="bibr">36</xref>,<xref rid="b38-mmr-0-0-11890" ref-type="bibr">38</xref>). HiCanu is a modified version of Canu, adapted to take advantage of the characteristics of HiFi reads. Hifiasm produces two different files for the primary and alternative assembly, whereas HiCanu combines the primary and the alternative assembly in the same FASTA file.</p>
</sec>
<sec>
<title>Scaffolding</title>
<p>In order to test the necessity of scaffolding, a scaffolder was used to improve the assembly continuity and completeness, as follows: Hi-C data are mapped to the primary assembly by Arima mapping pipeline (<xref rid="b39-mmr-0-0-11890" ref-type="bibr">39</xref>), to produce a BAM file which is consequently converted to a BED file. SALSA (version 2.2) (<xref rid="b40-mmr-0-0-11890" ref-type="bibr">40</xref>) uses this BED file which contains the mapping information of Hi-C reads on the assembly, to scaffold the primary assembly.</p>
</sec>
<sec>
<title>Quality control metrics</title>
<p>For the quality control of the assemblies produced, different evaluation tools were used. These tools produce and present the qualitative and quantitative characteristics of the assemblies in a comprehensible way. QUAST (version 5.0.2) (<xref rid="b41-mmr-0-0-11890" ref-type="bibr">41</xref>), a genome assembly evaluation tool, produces various metrics for our assemblies, using a reference genome (<xref rid="tII-mmr-0-0-11890" ref-type="table">Table II</xref>). The standard assembly statistics include the calculation of N50/NG50 and L50/LG50 values (<xref rid="b42-mmr-0-0-11890" ref-type="bibr">42</xref>), as follows: N50 (or NG50) is the size of the contig, where at least 50&#x0025; of the genome assembly size (or the reference genome size), is contained in contigs of equal or larger size than this contig. Higher N50/NG50 values signify more contiguous assemblies. L50 (or LG50) is the smallest number of contigs whose length sum makes up for at least 50&#x0025; of the genome assembly length (or reference genome length). Lower L50/LG50 values signify more contiguous assemblies. Furthermore, QUAST makes use of BUSCO (Quast version 5.0.2) (<xref rid="b43-mmr-0-0-11890" ref-type="bibr">43</xref>), to assess genome assembly and annotation completeness, based on evolutionarily-informed expectations of gene content of near-universal single-copy orthologs.</p>
</sec>
<sec>
<title>Genome consistency plots</title>
<p>JupiterPlot (version 1.0) (<xref rid="b44-mmr-0-0-11890" ref-type="bibr">44</xref>) is a workflow that uses Circos (<xref rid="b45-mmr-0-0-11890" ref-type="bibr">45</xref>) to generate a genome assembly consistency plot between a reference genome and a genome assembly. The chromosomes of the reference genome are represented as coloured arcs on the left half circle of the plot, whereas the contigs/scaffolds of the assembled genome are represented as outlined white arcs on the right half circle. The number and size of white arcs is indicative of the genome contiguity. JupiterPlot represents synteny between the reference and the assembled genome, indicating corresponding contiguous regions as ribbons whose width is proportional to their sequence length. In this manner, assembly errors and chromosomal misassemblies can be visually identified: A ribbon in twisted position represents an inversion, a ribbon which crosses over other ribbons represents a translocation, a lack of a ribbon connecting a region of the reference genome represents a deletion and the overlap of two ribbons connecting the same reference genome region represents a duplication. Although in other cases these misassemblies may represent genuine chromosomal aberrations, in our case they represent assembly errors due to low sequence complexity of repetitive regions such as centromeres, telomeres, etc., low sequencing coverage and weaknesses of each assembly algorithm.</p>
</sec>
</sec>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title/>
<sec>
<title>Drosophila genome assemblies</title>
<p>Primary (unscaffolded) MaSuRCA (CABOG or Flye) hybrid assemblies are by far the most fragmented of all <italic>Drosophila virilis</italic> assemblies, based on N50/NG50 and L50/LG50 values (<xref rid="tIII-mmr-0-0-11890" ref-type="table">Table III</xref>) and manual inspection of genome assembly consistency plots (<xref rid="f2-mmr-0-0-11890" ref-type="fig">Fig. 2</xref>). Canu, based exclusively on long Nanopore data, produced the most contiguous primary assembly. MaSuRCA/CABOG produced the most misassembled contigs, while Wengan hybrid assembler created the least misassembled ones. All but Canu assemblies present very high rates of preserved gene completeness, similar to the rates of the reference genomes (<xref rid="tIV-mmr-0-0-11890" ref-type="table">Table IV</xref>). The sizes of all <italic>Drosophila virilis</italic> primary assemblies are comparable to each other and very similar to that of the reference genome. Wengan is the fastest hybrid assembler and produced the <italic>Drosophila virilis</italic> genome 71 times faster than Canu, while the average CPU usage of Wengan is smaller than the rest of these assemblers (<xref rid="tV-mmr-0-0-11890" ref-type="table">Table V</xref>). Hi-C-based scaffolding ameliorated the contiguity and it limited the misassemblies of all assemblies, but it did not improve the gene completeness and it did not alter the final assembly size.</p>
<p>In <italic>Drosophila melanogaster</italic> primary assemblies, Hifiasm outperformed HiCanu, producing less fragmented and misassembled contigs (<xref rid="f3-mmr-0-0-11890" ref-type="fig">Figs. 3</xref> and <xref rid="f4-mmr-0-0-11890" ref-type="fig">4</xref>). As HiCanu produces phased assemblies, the vast majority of single-copy genes appeared as completed and duplicated in BUSCO analysis (<xref rid="tIV-mmr-0-0-11890" ref-type="table">Table IV</xref>). Nevertheless, the sum of completed single and duplicated BUSCOs in Hifiasm and HiCanu was practically identical to that of the reference genome. While using the 11 Kbp insert size and 37&#x00D7; coverage data, Hifiasm produced <italic>Drosophila melanogaster</italic> genome faster than HiCanu. However, as the coverage was increased, the assembly time of Hifiasm increased more rapidly than that of HiCanu: The assembly time of Hifiasm and HiCanu using 24 Kbp insert size and 40&#x00D7; coverage data was approximately the same, while HiCanu was 12&#x00D7; faster than Hifiasm, when 24 Kbp insert size and 92&#x00D7; coverage was used. The average CPU usage of HiCanu was also smaller than that of Hifiasm (<xref rid="tV-mmr-0-0-11890" ref-type="table">Table V</xref>). SALSA scaffolding based on Hi-C data, slightly improved Hifiasm assemblies, while it ameliorated the contiguity of HiCanu ones. It also slightly limited the misassemblies of HiCanu outputs. It did not influence the gene completeness of any assembly. Insert size (11 and 24 Kbp) and coverage (37&#x00D7;, 40&#x00D7; and 92&#x00D7;) did not influence the outcome of Hifiasm; however, a small deterioration in assembly contiguity at the 92&#x00D7; coverage was noted. On the other hand, a higher insert size and coverage improved HiCanu performance.</p>
<p>Overall, Hifiasm performed most effectively in the primary assembly of <italic>Drosophila melanogaster</italic> genome (which is comparable to that of <italic>Drosophila virilis</italic>), in terms of genome contiguity, accuracy and completeness. At 37&#x00D7; and 40&#x00D7; coverages, Hifiasm was also the fastest assembler; however, the CPU usage of Wengan and HiCanu was half of that of Hifiasm. The combination of Hi-C data had a minimal effect on the improvement of Hifiasm assembly. Among hybrid assemblers, Wengan performed best when combined with SALSA.</p>
</sec>
<sec>
<title>Homo sapiens genome assemblies</title>
<p>The human genome is much more complex than that of <italic>Drosophila</italic>; thus, its assembly is a more demanding task which requires much more computational resources. MaSuRCA and Wengan hybrid assemblers and Canu long-read assembler, were not able to complete the assembly of the human genome, even in half of the original Illumina and Nanopore coverage, on the BRFAA cluster with 128 GB RAM. Wengan, though, was able to produce a human genome assembly on AUTh computational system with 256 GB RAM, when FASTQ files were subsampled by half (<xref rid="f5-mmr-0-0-11890" ref-type="fig">Fig. 5</xref>). The incorporation of Hi-C data improved the genome continuity and completeness, while reducing misassemblies (<xref rid="tVI-mmr-0-0-11890" ref-type="table">Table VI</xref>).</p>
<p>Hifiasm was unable to assemble the human genome on the BRFAA cluster when the original 30&#x00D7; coverage of HiFi data was used. Nevertheless, it succeeded to produce a notable assembly on the same computational system with subsampled data (16&#x00D7; coverage), in contrast to HiCanu, which failed to run because of low memory resources, even with the subsampled data. Hifiasm failed to produce a contig for chromosome 22. SALSA improved the contiguity, accuracy and completeness of Hifiasm assembly (<xref rid="tVI-mmr-0-0-11890" ref-type="table">Table VI</xref>). The longest chromosomes of the genome are well assembled, however, four of the smallest autosomal chromosomes (chr 16, 19, 21, 22) are missing (<xref rid="f5-mmr-0-0-11890" ref-type="fig">Fig. 5</xref>).</p>
<p>Hifiasm outperformed HiCanu, Canu, Wengan and MaSuRCA, as it managed to run in low resources and low coverage, producing superior primary and scaffolded assemblies to those of Wengan.</p>
</sec>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>The use of a reference genome in the study of medical genetics, with the help of novel tools and methods, can help the identification of novel drug-sequence variant interactions (<xref rid="b46-mmr-0-0-11890" ref-type="bibr">46</xref>) and the identification of variants which may be related to mutations with a genetic base of a variety of genetic diseases, such as cancer (<xref rid="b47-mmr-0-0-11890" ref-type="bibr">47</xref>) and produce further analysis (<xref rid="b48-mmr-0-0-11890" ref-type="bibr">48</xref>). By studying these variants, we are able to analyse the differences and the heterogeneity of different populations in order to understand their differences (<xref rid="b49-mmr-0-0-11890" ref-type="bibr">49</xref>).</p>
<p>To propose an optimised <italic>de novo</italic> genome assembly workflow, in the present study, factors such as the maximum assembly contiguity, accuracy and completeness were taken into account, without ignoring other parameters crucial for the execution of the sequencing experiments and the production of the assemblies, such as financial, computational power and time limitations.</p>
<p>These findings suggest that the assembly exclusively based on long highly accurate PacBio Hifi reads outperforms Illumina-Nanopore hybrid and Nanopore assembly. <italic>de novo</italic> genome assemblers which use HiFi reads, require lower amounts of data compared to other strategies. It has been reported that a 30&#x00D7; genome coverage, using HiFi data, is sufficient in order to produce high quality assemblies (<xref rid="b18-mmr-0-0-11890" ref-type="bibr">18</xref>,<xref rid="b50-mmr-0-0-11890" ref-type="bibr">50</xref>). The present study revealed that even a 16&#x00D7; coverage of the human genome was adequate for that purpose. Thus, subsampling in Hifiasm assembly strategy allows the adaptation of sequencing data to the computational resources available as follows: Sequencing data with a coverage of no higher than 40&#x00D7; can be produced as the current findings and previous experience from other Hifiasm users (<uri xlink:href="https://downloads.pacbcloud.com/public/dataset/redwood2020/hifiasm/v12/">https://downloads.pacbcloud.com/public/dataset/redwood2020/hifiasm/v12/</uri>) suggest, and if the computational system fails to run, the data can be subsampled using the divide and conquer approach, until the computational resources are adequate for the analysis. However, if the subsampled data correspond to &#x003C;30&#x00D7; coverage, the final assembly can be deteriorated, as we notice on <italic>Homo sapiens</italic> assembly, where chromosome 22 is missing from the primary assembly and chromosomes 16, 19, 21 and 22 from the final assembly, after the scaffolding and correction process. On the other hand, it has been reported that a hybrid assembly would need 50&#x00D7; Illumina short-read coverage and 30&#x00D7; Nanopore long-read coverage of the genome (<xref rid="b15-mmr-0-0-11890" ref-type="bibr">15</xref>,<xref rid="b51-mmr-0-0-11890" ref-type="bibr">51</xref>,<xref rid="b52-mmr-0-0-11890" ref-type="bibr">52</xref>). In the case of the human genome, notable results with a 34&#x00D7; Illumina and 30&#x00D7; Nanopore coverage were able to be produced. Therefore, the volume of data used for HiFi assemblies is much smaller. As the volume of data decreases, so do the computational requirements for CPU power and particularly memory. In addition, the use of highly accurate long reads, bypasses several computationally demanding, time consuming steps of the assembly workflow.</p>
<p>In hybrid assembly strategy, Wengan performed most effectively in terms of accuracy and speed. Wengan produced the most contiguous <italic>Drosophila virilis</italic> assemblies. Although no hybrid assembler produced a human genome assembly in BRFAA cluster, Wengan was the only assembler that managed to construct a primary assembly in AUTh computational system.</p>
<p>The assembler we recommend for HiFi reads is Hifiasm, as it outperformed HiCanu in a small genome and it succeeded to produce a notable assembly of a large genome whereas HiCanu failed to run. Hifiasm performed equally well in respect of insert size and coverage, while HiCanu output improves with the increase in insert size and coverage. We recommend the use of Hifiasm or HiCanu assemblers, depending on the available computational resources as well as the organism&#x0027;s genome size and complexity. Hifiasm produced the most contiguous assemblies and its assembly strategy is highly efficient in terms of computational power and time on a single node of the cluster. For this reason, Hifiasm is also used by the Human Pangenome Project (<uri xlink:href="https://humanpangenome.org/">https://humanpangenome.org/</uri>). On the other hand, HiCanu gives the possibility to run the assembly on grid when using a computational cluster. Distributing the tasks on different nodes allows the use of more computational resources than running on a central resource and jobs can be executed in parallel speeding performance. Although running on grid, HiCanu was unable to produce a human genome assembly, as the main bottleneck of all assemblers is RAM size. Finally, by following PacBio HiFi assembly strategy for small genomes, we utilise only one sample preparation and one sequencing technology, in contrast to the Illumina/Nanopore hybrid strategy where we need to make three sample preparations (Illumina, Nanopore and Hi-C) and utilise two sequencing technologies (Illumina sequencing for short genomic and Hi-C reads and Nanopore for long genomic reads). For larger genomes, similar to the human one, PacBio HiFi assembly strategy relies on two sample preparations and two sequencing technologies (Illumina sequencing for short Hi-C reads and PacBio long genomic reads).</p>
<p>Our analysis suggests that the use of additional information for scaffolding is not necessary in small genomes (such as insect genomes); however, it offers a noticeable improvement in larger and more complex genomes (such as the human genome and higher plant genomes). The computational resources required for scaffolding, even for the most complex genomes, are far less than those for the assembly step. Ideally, the use of multiple types of data, seems to exploit different genome features. The successive use of 10&#x00D7; (<uri xlink:href="https://www.10&#x00D7;genomics.com/">https://www.10&#x00D7;genomics.com/</uri>) (<xref rid="b53-mmr-0-0-11890" ref-type="bibr">53</xref>,<xref rid="b54-mmr-0-0-11890" ref-type="bibr">54</xref>), Bionano (<uri xlink:href="https://bionanogenomics.com/">https://bionanogenomics.com/</uri>) (<xref rid="b55-mmr-0-0-11890" ref-type="bibr">55</xref>) and Hi-C data will generate the most accurate scaffolds (<xref rid="b25-mmr-0-0-11890" ref-type="bibr">25</xref>,<xref rid="b56-mmr-0-0-11890" ref-type="bibr">56</xref>). Although the use of 10&#x00D7; and Bionano data is not imperative, Hi-C sequencing reads are highly recommended for complex genomes, in order to increase the continuity of the assembly, while improving the accuracy by reducing major misassemblies and translocations.</p>
<p>The development of sequencing technologies led to a great reduction on sequencing cost. The purchase of a sequencer is no longer compulsory for genome assembly projects, as different institutes provide a variety of sequencing services at affordable, by many labs, prices. Each of PacBio, Illumina and Nanopore, offers a network of certified sequencing service providers. Some of these providers are certified for more than one of those sequencing technologies. Moreover, the purchase of a computational cluster is no longer necessary, as bioinformatics infrastructures, such as ELIXIR (<xref rid="b57-mmr-0-0-11890" ref-type="bibr">57</xref>), can offer researchers the computational recourses necessary for the accomplishment of demanding tasks, such as a <italic>de novo</italic> genome assembly.</p>
<p>The major bottlenecks in genome assembly projects were the computationally demanding assembly algorithms and the large cost of sequencing. The development of new assembly algorithms, which require much less computational power and memory, is the result of major improvements in long-read accuracy by PacBio. The future of genomics relies on long-reads in order to resolve low complexity regions of the genomes and perform telomere-to-telomere assemblies. Alongside to the advances of read accuracy, third generation sequencing led to the reduction of sequencing cost. Furthermore, the increase of genomic data availability in public databases (<xref rid="b58-mmr-0-0-11890" ref-type="bibr">58</xref>), such as Sequence Read Archive (SRA) (<xref rid="b59-mmr-0-0-11890" ref-type="bibr">59</xref>), allows researchers to find and use a variety of raw sequencing data from the same species of interest, already produced by others, for the primary assembly and/or the scaffolding process. Finally, it is important to note that all assembly algorithms and methods we utilised during this work, are being constantly updated in order to improve in terms of performance and computational efficiency, allowing even the reanalysis of older data and the discovery of novel information. In addition, as basecallers are also constantly updated, reusing raw signal files (for example, fast5-formatted files in Nanopore) can produce more accurate reads.</p>
<p>In conclusion, continuous advancements in all fields mentioned above, lead towards the democratisation of <italic>de novo</italic> genome assembly projects, by enabling scientific laboratories with limited technical and financial resources to perform a great variety of genomic studies, without the need for expensive sequencing equipment and computational infrastructure.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>The analyses of this work have been performed using the computing cluster of the Greek Genome Centre of the Biomedical Research Foundation, Academy of Athens and the Aristotle University of Thessaloniki (AUTh) High Performance Computing Infrastructure and Resources.</p>
</ack>
<sec>
<title>Funding</title>
<p>No funding was received.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of data and materials</title>
<p>The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.</p>
</sec>
<sec>
<title>Authors&#x0027; contributions</title>
<p>MG and KK analysed and interpreted the data. IM conceived and coordinated the current study. DAS was also involved in the conception of the study. MG and IM assessed the authenticity of all the raw data to ensure its legitimacy. All authors contributed to the writing and revision of the work and approved the final manuscript.</p>
</sec>
<sec>
<title>Ethics approval and consent to participate</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Patient consent for publication</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Competing interests</title>
<p>DAS is the Editor-in-Chief for the journal, but had no personal involvement in the reviewing process, or any influence in terms of adjudicating on the final decision, for this article. The other authors declare that they have no competing interests.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="b1-mmr-0-0-11890"><label>1</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lander</surname><given-names>ES</given-names></name><name><surname>Linton</surname><given-names>LM</given-names></name><name><surname>Birren</surname><given-names>B</given-names></name><name><surname>Nusbaum</surname><given-names>C</given-names></name><name><surname>Zody</surname><given-names>MC</given-names></name><name><surname>Baldwin</surname><given-names>J</given-names></name><name><surname>Devon</surname><given-names>K</given-names></name><name><surname>Dewar</surname><given-names>K</given-names></name><name><surname>Doyle</surname><given-names>M</given-names></name><name><surname>FitzHugh</surname><given-names>W</given-names></name><etal/><collab collab-type="corp-author">International Human Genome Sequencing Consortium</collab></person-group><article-title>Initial sequencing and analysis of the human genome</article-title><source>Nature</source><volume>409</volume><fpage>860</fpage><lpage>921</lpage><year>2001</year><pub-id pub-id-type="doi">10.1038/35057062</pub-id><pub-id pub-id-type="pmid">11237011</pub-id></element-citation></ref>
<ref id="b2-mmr-0-0-11890"><label>2</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sanger</surname><given-names>F</given-names></name><name><surname>Nicklen</surname><given-names>S</given-names></name><name><surname>Coulson</surname><given-names>AR</given-names></name></person-group><article-title>DNA sequencing with chain-terminating inhibitors</article-title><source>Proc Natl Acad Sci USA</source><volume>74</volume><fpage>5463</fpage><lpage>5467</lpage><year>1977</year><pub-id pub-id-type="doi">10.1073/pnas.74.12.5463</pub-id><pub-id pub-id-type="pmid">271968</pub-id></element-citation></ref>
<ref id="b3-mmr-0-0-11890"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kent</surname><given-names>WJ</given-names></name><name><surname>Haussler</surname><given-names>D</given-names></name></person-group><article-title>Assembly of the working draft of the human genome with GigAssembler</article-title><source>Genome Res</source><volume>11</volume><fpage>1541</fpage><lpage>1548</lpage><year>2001</year><pub-id pub-id-type="doi">10.1101/gr.183201</pub-id><pub-id pub-id-type="pmid">11544197</pub-id></element-citation></ref>
<ref id="b4-mmr-0-0-11890"><label>4</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Shendure</surname><given-names>J</given-names></name><name><surname>Balasubramanian</surname><given-names>S</given-names></name><name><surname>Church</surname><given-names>GM</given-names></name><name><surname>Gilbert</surname><given-names>W</given-names></name><name><surname>Rogers</surname><given-names>J</given-names></name><name><surname>Schloss</surname><given-names>JA</given-names></name><name><surname>Waterston</surname><given-names>RH</given-names></name></person-group><article-title>DNA sequencing at 40: Past, present and future</article-title><source>Nature</source><volume>550</volume><fpage>345</fpage><lpage>353</lpage><year>2017</year><pub-id pub-id-type="doi">10.1038/nature24286</pub-id><pub-id pub-id-type="pmid">29019985</pub-id></element-citation></ref>
<ref id="b5-mmr-0-0-11890"><label>5</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Salzberg</surname><given-names>SL</given-names></name><name><surname>Yorke</surname><given-names>JA</given-names></name></person-group><article-title>Beware of mis-assembled genomes</article-title><source>Bioinformatics</source><volume>21</volume><fpage>4320</fpage><lpage>4321</lpage><year>2005</year><pub-id pub-id-type="doi">10.1093/bioinformatics/bti769</pub-id><pub-id pub-id-type="pmid">16332717</pub-id></element-citation></ref>
<ref id="b6-mmr-0-0-11890"><label>6</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chaisson</surname><given-names>MJ</given-names></name><name><surname>Wilson</surname><given-names>RK</given-names></name><name><surname>Eichler</surname><given-names>EE</given-names></name></person-group><article-title>Genetic variation and the de novo assembly of human genomes</article-title><source>Nat Rev Genet</source><volume>16</volume><fpage>627</fpage><lpage>640</lpage><year>2015</year><pub-id pub-id-type="doi">10.1038/nrg3933</pub-id><pub-id pub-id-type="pmid">26442640</pub-id></element-citation></ref>
<ref id="b7-mmr-0-0-11890"><label>7</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>van Dijk</surname><given-names>EL</given-names></name><name><surname>Jaszczyszyn</surname><given-names>Y</given-names></name><name><surname>Naquin</surname><given-names>D</given-names></name><name><surname>Thermes</surname><given-names>C</given-names></name></person-group><article-title>The Third Revolution in sequencing technology</article-title><source>Trends Genet</source><volume>34</volume><fpage>666</fpage><lpage>681</lpage><year>2018</year><pub-id pub-id-type="doi">10.1016/j.tig.2018.05.008</pub-id><pub-id pub-id-type="pmid">29941292</pub-id></element-citation></ref>
<ref id="b8-mmr-0-0-11890"><label>8</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kasianowicz</surname><given-names>JJ</given-names></name><name><surname>Brandin</surname><given-names>E</given-names></name><name><surname>Branton</surname><given-names>D</given-names></name><name><surname>Deamer</surname><given-names>DW</given-names></name></person-group><article-title>Characterization of individual polynucleotide molecules using a membrane channel</article-title><source>Proc Natl Acad Sci USA</source><volume>93</volume><fpage>13770</fpage><lpage>13773</lpage><year>1996</year><pub-id pub-id-type="doi">10.1073/pnas.93.24.13770</pub-id><pub-id pub-id-type="pmid">8943010</pub-id></element-citation></ref>
<ref id="b9-mmr-0-0-11890"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Haque</surname><given-names>F</given-names></name><name><surname>Li</surname><given-names>J</given-names></name><name><surname>Wu</surname><given-names>HC</given-names></name><name><surname>Liang</surname><given-names>XJ</given-names></name><name><surname>Guo</surname><given-names>P</given-names></name></person-group><article-title>Solid-state and biological nanopore for real-time sensing of single chemical and sequencing of DNA</article-title><source>Nano Today</source><volume>8</volume><fpage>56</fpage><lpage>74</lpage><year>2013</year><pub-id pub-id-type="doi">10.1016/j.nantod.2012.12.008</pub-id><pub-id pub-id-type="pmid">23504223</pub-id></element-citation></ref>
<ref id="b10-mmr-0-0-11890"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eid</surname><given-names>J</given-names></name><name><surname>Fehr</surname><given-names>A</given-names></name><name><surname>Gray</surname><given-names>J</given-names></name><name><surname>Luong</surname><given-names>K</given-names></name><name><surname>Lyle</surname><given-names>J</given-names></name><name><surname>Otto</surname><given-names>G</given-names></name><name><surname>Peluso</surname><given-names>P</given-names></name><name><surname>Rank</surname><given-names>D</given-names></name><name><surname>Baybayan</surname><given-names>P</given-names></name><name><surname>Bettman</surname><given-names>B</given-names></name><etal/></person-group><article-title>Real-time DNA sequencing from single polymerase molecules</article-title><source>Science</source><volume>323</volume><fpage>133</fpage><lpage>138</lpage><year>2009</year><pub-id pub-id-type="doi">10.1126/science.1162986</pub-id><pub-id pub-id-type="pmid">19023044</pub-id></element-citation></ref>
<ref id="b11-mmr-0-0-11890"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aird</surname><given-names>D</given-names></name><name><surname>Ross</surname><given-names>MG</given-names></name><name><surname>Chen</surname><given-names>W-S</given-names></name><name><surname>Danielsson</surname><given-names>M</given-names></name><name><surname>Fennell</surname><given-names>T</given-names></name><name><surname>Russ</surname><given-names>C</given-names></name><name><surname>Jaffe</surname><given-names>DB</given-names></name><name><surname>Nusbaum</surname><given-names>C</given-names></name><name><surname>Gnirke</surname><given-names>A</given-names></name></person-group><article-title>Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries</article-title><source>Genome Biol</source><volume>12</volume><fpage>R18</fpage><year>2011</year><pub-id pub-id-type="doi">10.1186/gb-2011-12-2-r18</pub-id><pub-id pub-id-type="pmid">21338519</pub-id></element-citation></ref>
<ref id="b12-mmr-0-0-11890"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jain</surname><given-names>M</given-names></name><name><surname>Koren</surname><given-names>S</given-names></name><name><surname>Miga</surname><given-names>KH</given-names></name><name><surname>Quick</surname><given-names>J</given-names></name><name><surname>Rand</surname><given-names>AC</given-names></name><name><surname>Sasani</surname><given-names>TA</given-names></name><name><surname>Tyson</surname><given-names>JR</given-names></name><name><surname>Beggs</surname><given-names>AD</given-names></name><name><surname>Dilthey</surname><given-names>AT</given-names></name><name><surname>Fiddes</surname><given-names>IT</given-names></name><etal/></person-group><article-title>Nanopore sequencing and assembly of a human genome with ultra-long reads</article-title><source>Nat Biotechnol</source><volume>36</volume><fpage>338</fpage><lpage>345</lpage><year>2018</year><pub-id pub-id-type="doi">10.1038/nbt.4060</pub-id><pub-id pub-id-type="pmid">29431738</pub-id></element-citation></ref>
<ref id="b13-mmr-0-0-11890"><label>13</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname><given-names>Y</given-names></name><name><surname>Yuan</surname><given-names>J</given-names></name><name><surname>Kolmogorov</surname><given-names>M</given-names></name><name><surname>Shen</surname><given-names>MW</given-names></name><name><surname>Chaisson</surname><given-names>M</given-names></name><name><surname>Pevzner</surname><given-names>PA</given-names></name></person-group><article-title>Assembly of long error-prone reads using de Bruijn graphs</article-title><source>Proc Natl Acad Sci USA</source><volume>113</volume><fpage>E8396</fpage><lpage>E8405</lpage><year>2016</year><pub-id pub-id-type="doi">10.1073/pnas.1604560113</pub-id><pub-id pub-id-type="pmid">27956617</pub-id></element-citation></ref>
<ref id="b14-mmr-0-0-11890"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tan</surname><given-names>MH</given-names></name><name><surname>Austin</surname><given-names>CM</given-names></name><name><surname>Hammer</surname><given-names>MP</given-names></name><name><surname>Lee</surname><given-names>YP</given-names></name><name><surname>Croft</surname><given-names>LJ</given-names></name><name><surname>Gan</surname><given-names>HM</given-names></name></person-group><article-title>Finding Nemo: Hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (<italic>Amphiprion ocellaris</italic>) genome assembly</article-title><source>Gigascience</source><volume>7</volume><fpage>1</fpage><lpage>6</lpage><year>2018</year><pub-id pub-id-type="doi">10.1093/gigascience/gix137</pub-id></element-citation></ref>
<ref id="b15-mmr-0-0-11890"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nowak</surname><given-names>RM</given-names></name><name><surname>Jastrz&#x0119;bski</surname><given-names>JP</given-names></name><name><surname>Ku&#x015B;mirek</surname><given-names>W</given-names></name><name><surname>Sa&#x0142;amatin</surname><given-names>R</given-names></name><name><surname>Rydzanicz</surname><given-names>M</given-names></name><name><surname>Sobczyk-Kopcio&#x0142;</surname><given-names>A</given-names></name><name><surname>Sulima-Celi&#x0144;ska</surname><given-names>A</given-names></name><name><surname>Paukszto</surname><given-names>&#x0141;</given-names></name><name><surname>Makowczenko</surname><given-names>KG</given-names></name><name><surname>P&#x0142;oski</surname><given-names>R</given-names></name><etal/></person-group><article-title>Hybrid de novo whole-genome assembly and annotation of the model tapeworm Hymenolepis diminuta</article-title><source>Sci Data</source><volume>6</volume><fpage>302</fpage><year>2019</year><pub-id pub-id-type="doi">10.1038/s41597-019-0311-3</pub-id><pub-id pub-id-type="pmid">31796747</pub-id></element-citation></ref>
<ref id="b16-mmr-0-0-11890"><label>16</label><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Korlach</surname><given-names>J</given-names></name><name><surname>Turner</surname><given-names>SW</given-names></name></person-group><article-title>Zero-Mode Waveguides</article-title><source>Encyclopedia of Biophysics</source><person-group person-group-type="editor"><name><surname>Roberts</surname><given-names>GC</given-names></name></person-group><publisher-name>Springer</publisher-name><publisher-loc>Heidelberg</publisher-loc><fpage>2793</fpage><lpage>2795</lpage><year>2013</year><pub-id pub-id-type="doi">10.1007/978-3-642-16712-6_499</pub-id></element-citation></ref>
<ref id="b17-mmr-0-0-11890"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wenger</surname><given-names>AM</given-names></name><name><surname>Peluso</surname><given-names>P</given-names></name><name><surname>Rowell</surname><given-names>WJ</given-names></name><name><surname>Chang</surname><given-names>PC</given-names></name><name><surname>Hall</surname><given-names>RJ</given-names></name><name><surname>Concepcion</surname><given-names>GT</given-names></name><name><surname>Ebler</surname><given-names>J</given-names></name><name><surname>Fungtammasan</surname><given-names>A</given-names></name><name><surname>Kolesnikov</surname><given-names>A</given-names></name><name><surname>Olson</surname><given-names>ND</given-names></name><etal/></person-group><article-title>Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome</article-title><source>Nat Biotechnol</source><volume>37</volume><fpage>1155</fpage><lpage>1162</lpage><year>2019</year><pub-id pub-id-type="doi">10.1038/s41587-019-0217-9</pub-id><pub-id pub-id-type="pmid">31406327</pub-id></element-citation></ref>
<ref id="b18-mmr-0-0-11890"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Silvestre-Ryan</surname><given-names>J</given-names></name><name><surname>Holmes</surname><given-names>I</given-names></name></person-group><article-title>Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing</article-title><source>Genome Biol</source><volume>22</volume><fpage>38</fpage><year>2021</year><pub-id pub-id-type="doi">10.1186/s13059-020-02255-1</pub-id><pub-id pub-id-type="pmid">33468205</pub-id></element-citation></ref>
<ref id="b19-mmr-0-0-11890"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ghurye</surname><given-names>J</given-names></name><name><surname>Pop</surname><given-names>M</given-names></name></person-group><article-title>Modern technologies and algorithms for scaffolding assembled genomes</article-title><source>PLoS Comput Biol</source><volume>15</volume><fpage>e1006994</fpage><year>2019</year><pub-id pub-id-type="doi">10.1371/journal.pcbi.1006994</pub-id><pub-id pub-id-type="pmid">31166948</pub-id></element-citation></ref>
<ref id="b20-mmr-0-0-11890"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lieberman-Aiden</surname><given-names>E</given-names></name><name><surname>van Berkum</surname><given-names>NL</given-names></name><name><surname>Williams</surname><given-names>L</given-names></name><name><surname>Imakaev</surname><given-names>M</given-names></name><name><surname>Ragoczy</surname><given-names>T</given-names></name><name><surname>Telling</surname><given-names>A</given-names></name><name><surname>Amit</surname><given-names>I</given-names></name><name><surname>Lajoie</surname><given-names>BR</given-names></name><name><surname>Sabo</surname><given-names>PJ</given-names></name><name><surname>Dorschner</surname><given-names>MO</given-names></name><etal/></person-group><article-title>Comprehensive mapping of long-range interactions reveals folding principles of the human genome</article-title><source>Science</source><volume>326</volume><fpage>289</fpage><lpage>293</lpage><year>2009</year><pub-id pub-id-type="doi">10.1126/science.1181369</pub-id><pub-id pub-id-type="pmid">19815776</pub-id></element-citation></ref>
<ref id="b21-mmr-0-0-11890"><label>21</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Auton</surname><given-names>A</given-names></name><name><surname>Brooks</surname><given-names>LD</given-names></name><name><surname>Durbin</surname><given-names>RM</given-names></name><name><surname>Garrison</surname><given-names>EP</given-names></name><name><surname>Kang</surname><given-names>HM</given-names></name><name><surname>Korbel</surname><given-names>JO</given-names></name><name><surname>Marchini</surname><given-names>JL</given-names></name><name><surname>McCarthy</surname><given-names>S</given-names></name><name><surname>McVean</surname><given-names>GA</given-names></name><name><surname>Abecasis</surname><given-names>GR</given-names></name><collab collab-type="corp-author">1000 Genomes Project Consortium</collab></person-group><article-title>A global reference for human genetic variation</article-title><source>Nature</source><volume>526</volume><fpage>68</fpage><lpage>74</lpage><year>2015</year><pub-id pub-id-type="doi">10.1038/nature15393</pub-id><pub-id pub-id-type="pmid">26432245</pub-id></element-citation></ref>
<ref id="b22-mmr-0-0-11890"><label>22</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koepfli</surname><given-names>KP</given-names></name><name><surname>Paten</surname><given-names>B</given-names></name><name><surname>O&#x0027;Brien</surname><given-names>SJ</given-names></name><collab collab-type="corp-author">Genome 10K Community of Scientists</collab></person-group><article-title>The Genome 10K Project: A way forward</article-title><source>Annu Rev Anim Biosci</source><volume>3</volume><fpage>57</fpage><lpage>111</lpage><year>2015</year><pub-id pub-id-type="doi">10.1146/annurev-animal-090414-014900</pub-id><pub-id pub-id-type="pmid">25689317</pub-id></element-citation></ref>
<ref id="b23-mmr-0-0-11890"><label>23</label><element-citation publication-type="journal"><article-title>ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium: Pan-cancer analysis of whole genomes</article-title><source>Nature</source><volume>578</volume><fpage>82</fpage><lpage>93</lpage><year>2020</year><pub-id pub-id-type="doi">10.1038/s41586-020-1969-6</pub-id><pub-id pub-id-type="pmid">32025007</pub-id></element-citation></ref>
<ref id="b24-mmr-0-0-11890"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Logsdon</surname><given-names>GA</given-names></name><name><surname>Vollger</surname><given-names>MR</given-names></name><name><surname>Hsieh</surname><given-names>P</given-names></name><name><surname>Mao</surname><given-names>Y</given-names></name><name><surname>Liskovykh</surname><given-names>MA</given-names></name><name><surname>Koren</surname><given-names>S</given-names></name><name><surname>Nurk</surname><given-names>S</given-names></name><name><surname>Mercuri</surname><given-names>L</given-names></name><name><surname>Dishuck</surname><given-names>PC</given-names></name><name><surname>Rhie</surname><given-names>A</given-names></name></person-group><article-title>The structure, function, and evolution of a complete human chromosome 8</article-title><source>bioRxiv</source><month>Sep</month><day>8</day><year>2020</year><comment>(Epub ahead of print). https://doi.org/10.1101/2020.09.08.285395</comment></element-citation></ref>
<ref id="b25-mmr-0-0-11890"><label>25</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Miga</surname><given-names>KH</given-names></name><name><surname>Koren</surname><given-names>S</given-names></name><name><surname>Rhie</surname><given-names>A</given-names></name><name><surname>Vollger</surname><given-names>MR</given-names></name><name><surname>Gershman</surname><given-names>A</given-names></name><name><surname>Bzikadze</surname><given-names>A</given-names></name><name><surname>Brooks</surname><given-names>S</given-names></name><name><surname>Howe</surname><given-names>E</given-names></name><name><surname>Porubsky</surname><given-names>D</given-names></name><name><surname>Logsdon</surname><given-names>GA</given-names></name><etal/></person-group><article-title>Telomere-to-telomere assembly of a complete human X chromosome</article-title><source>Nature</source><volume>585</volume><fpage>79</fpage><lpage>84</lpage><year>2020</year><pub-id pub-id-type="doi">10.1038/s41586-020-2547-7</pub-id><pub-id pub-id-type="pmid">32663838</pub-id></element-citation></ref>
<ref id="b26-mmr-0-0-11890"><label>26</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Amid</surname><given-names>C</given-names></name><name><surname>Alako</surname><given-names>BTF</given-names></name><name><surname>Balavenkataraman Kadhirvelu</surname><given-names>V</given-names></name><name><surname>Burdett</surname><given-names>T</given-names></name><name><surname>Burgin</surname><given-names>J</given-names></name><name><surname>Fan</surname><given-names>J</given-names></name><name><surname>Harrison</surname><given-names>PW</given-names></name><name><surname>Holt</surname><given-names>S</given-names></name><name><surname>Hussein</surname><given-names>A</given-names></name><name><surname>Ivanov</surname><given-names>E</given-names></name><etal/></person-group><article-title>The European nucleotide archive in 2019</article-title><source>Nucleic Acids Res</source><volume>48</volume><fpage>D70</fpage><lpage>D76</lpage><year>2020</year><pub-id pub-id-type="pmid">31722421</pub-id></element-citation></ref>
<ref id="b27-mmr-0-0-11890"><label>27</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zimin</surname><given-names>AV</given-names></name><name><surname>Mar&#x00E7;ais</surname><given-names>G</given-names></name><name><surname>Puiu</surname><given-names>D</given-names></name><name><surname>Roberts</surname><given-names>M</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name><name><surname>Yorke</surname><given-names>JA</given-names></name></person-group><article-title>The MaSuRCA genome assembler</article-title><source>Bioinformatics</source><volume>29</volume><fpage>2669</fpage><lpage>2677</lpage><year>2013</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btt476</pub-id><pub-id pub-id-type="pmid">23990416</pub-id></element-citation></ref>
<ref id="b28-mmr-0-0-11890"><label>28</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zimin</surname><given-names>AV</given-names></name><name><surname>Puiu</surname><given-names>D</given-names></name><name><surname>Luo</surname><given-names>MC</given-names></name><name><surname>Zhu</surname><given-names>T</given-names></name><name><surname>Koren</surname><given-names>S</given-names></name><name><surname>Mar&#x00E7;ais</surname><given-names>G</given-names></name><name><surname>Yorke</surname><given-names>JA</given-names></name><name><surname>Dvo&#x0159;&#x00E1;k</surname><given-names>J</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name></person-group><article-title>Hybrid assembly of the large and highly repetitive genome of <italic>Aegilops tauschii</italic>, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm</article-title><source>Genome Res</source><volume>27</volume><fpage>787</fpage><lpage>792</lpage><year>2017</year><pub-id pub-id-type="doi">10.1101/gr.213405.116</pub-id><pub-id pub-id-type="pmid">28130360</pub-id></element-citation></ref>
<ref id="b29-mmr-0-0-11890"><label>29</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Di Genova</surname><given-names>A</given-names></name><name><surname>Buena-Atienza</surname><given-names>E</given-names></name><name><surname>Ossowski</surname><given-names>S</given-names></name><name><surname>Sagot</surname><given-names>MF</given-names></name></person-group><article-title>Wengan: Efficient and high quality hybrid de novo assembly of human genomes</article-title><source>bioRxiv</source><month>Nov</month><day>25</day><year>2019</year><comment>(Epub ahead of print). doi: https://doi.org/10.1101/840447</comment></element-citation></ref>
<ref id="b30-mmr-0-0-11890"><label>30</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname><given-names>JR</given-names></name><name><surname>Delcher</surname><given-names>AL</given-names></name><name><surname>Koren</surname><given-names>S</given-names></name><name><surname>Venter</surname><given-names>E</given-names></name><name><surname>Walenz</surname><given-names>BP</given-names></name><name><surname>Brownley</surname><given-names>A</given-names></name><name><surname>Johnson</surname><given-names>J</given-names></name><name><surname>Li</surname><given-names>K</given-names></name><name><surname>Mobarry</surname><given-names>C</given-names></name><name><surname>Sutton</surname><given-names>G</given-names></name></person-group><article-title>Aggressive assembly of pyrosequencing reads with mates</article-title><source>Bioinformatics</source><volume>24</volume><fpage>2818</fpage><lpage>2824</lpage><year>2008</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btn548</pub-id><pub-id pub-id-type="pmid">18952627</pub-id></element-citation></ref>
<ref id="b31-mmr-0-0-11890"><label>31</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname><given-names>R</given-names></name><name><surname>Liu</surname><given-names>B</given-names></name><name><surname>Xie</surname><given-names>Y</given-names></name><name><surname>Li</surname><given-names>Z</given-names></name><name><surname>Huang</surname><given-names>W</given-names></name><name><surname>Yuan</surname><given-names>J</given-names></name><name><surname>He</surname><given-names>G</given-names></name><name><surname>Chen</surname><given-names>Y</given-names></name><name><surname>Pan</surname><given-names>Q</given-names></name><name><surname>Liu</surname><given-names>Y</given-names></name><etal/></person-group><article-title>SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler</article-title><source>Gigascience</source><volume>1</volume><fpage>18</fpage><year>2012</year><pub-id pub-id-type="doi">10.1186/2047-217X-1-18</pub-id><pub-id pub-id-type="pmid">23587118</pub-id></element-citation></ref>
<ref id="b32-mmr-0-0-11890"><label>32</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kolmogorov</surname><given-names>M</given-names></name><name><surname>Yuan</surname><given-names>J</given-names></name><name><surname>Lin</surname><given-names>Y</given-names></name><name><surname>Pevzner</surname><given-names>PA</given-names></name></person-group><article-title>Assembly of long, error-prone reads using repeat graphs</article-title><source>Nat Biotechnol</source><volume>37</volume><fpage>540</fpage><lpage>546</lpage><year>2019</year><pub-id pub-id-type="doi">10.1038/s41587-019-0072-8</pub-id><pub-id pub-id-type="pmid">30936562</pub-id></element-citation></ref>
<ref id="b33-mmr-0-0-11890"><label>33</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Weisenfeld</surname><given-names>NI</given-names></name><name><surname>Yin</surname><given-names>S</given-names></name><name><surname>Sharpe</surname><given-names>T</given-names></name><name><surname>Lau</surname><given-names>B</given-names></name><name><surname>Hegarty</surname><given-names>R</given-names></name><name><surname>Holmes</surname><given-names>L</given-names></name><name><surname>Sogoloff</surname><given-names>B</given-names></name><name><surname>Tabbaa</surname><given-names>D</given-names></name><name><surname>Williams</surname><given-names>L</given-names></name><name><surname>Russ</surname><given-names>C</given-names></name><etal/></person-group><article-title>Comprehensive variation discovery in single human genomes</article-title><source>Nat Genet</source><volume>46</volume><fpage>1350</fpage><lpage>1355</lpage><year>2014</year><pub-id pub-id-type="doi">10.1038/ng.3121</pub-id><pub-id pub-id-type="pmid">25326702</pub-id></element-citation></ref>
<ref id="b34-mmr-0-0-11890"><label>34</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koren</surname><given-names>S</given-names></name><name><surname>Walenz</surname><given-names>BP</given-names></name><name><surname>Berlin</surname><given-names>K</given-names></name><name><surname>Miller</surname><given-names>JR</given-names></name><name><surname>Bergman</surname><given-names>NH</given-names></name><name><surname>Phillippy</surname><given-names>AM</given-names></name></person-group><article-title>Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation</article-title><source>Genome Res</source><volume>27</volume><fpage>722</fpage><lpage>736</lpage><year>2017</year><pub-id pub-id-type="doi">10.1101/gr.215087.116</pub-id><pub-id pub-id-type="pmid">28298431</pub-id></element-citation></ref>
<ref id="b35-mmr-0-0-11890"><label>35</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Myers</surname><given-names>EW</given-names></name><name><surname>Sutton</surname><given-names>GG</given-names></name><name><surname>Delcher</surname><given-names>AL</given-names></name><name><surname>Dew</surname><given-names>IM</given-names></name><name><surname>Fasulo</surname><given-names>DP</given-names></name><name><surname>Flanigan</surname><given-names>MJ</given-names></name><name><surname>Kravitz</surname><given-names>SA</given-names></name><name><surname>Mobarry</surname><given-names>CM</given-names></name><name><surname>Reinert</surname><given-names>KH</given-names></name><name><surname>Remington</surname><given-names>KA</given-names></name><etal/></person-group><article-title>A whole-genome assembly of <italic>Drosophila</italic></article-title><source>Science</source><volume>287</volume><fpage>2196</fpage><lpage>2204</lpage><year>2000</year><pub-id pub-id-type="doi">10.1126/science.287.5461.2196</pub-id><pub-id pub-id-type="pmid">10731133</pub-id></element-citation></ref>
<ref id="b36-mmr-0-0-11890"><label>36</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname><given-names>H</given-names></name><name><surname>Concepcion</surname><given-names>GT</given-names></name><name><surname>Feng</surname><given-names>X</given-names></name><name><surname>Zhang</surname><given-names>H</given-names></name><name><surname>Li</surname><given-names>H</given-names></name></person-group><article-title>Haplotype-resolved de novo assembly with phased assembly graphs</article-title><source>arXiv</source><month>Aug</month><day>3</day><year>2020</year><comment>(Epub ahead of print)</comment></element-citation></ref>
<ref id="b37-mmr-0-0-11890"><label>37</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nurk</surname><given-names>S</given-names></name><name><surname>Walenz</surname><given-names>BP</given-names></name><name><surname>Rhie</surname><given-names>A</given-names></name><name><surname>Vollger</surname><given-names>MR</given-names></name><name><surname>Logsdon</surname><given-names>GA</given-names></name><name><surname>Grothe</surname><given-names>R</given-names></name><name><surname>Miga</surname><given-names>KH</given-names></name><name><surname>Eichler</surname><given-names>EE</given-names></name><name><surname>Phillippy</surname><given-names>AM</given-names></name><name><surname>Koren</surname><given-names>S</given-names></name></person-group><article-title>HiCanu: Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads</article-title><source>Genome Res</source><volume>30</volume><fpage>1291</fpage><lpage>1305</lpage><year>2020</year><pub-id pub-id-type="doi">10.1101/gr.263566.120</pub-id><pub-id pub-id-type="pmid">32801147</pub-id></element-citation></ref>
<ref id="b38-mmr-0-0-11890"><label>38</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chin</surname><given-names>C-S</given-names></name><name><surname>Peluso</surname><given-names>P</given-names></name><name><surname>Sedlazeck</surname><given-names>FJ</given-names></name><name><surname>Nattestad</surname><given-names>M</given-names></name><name><surname>Concepcion</surname><given-names>GT</given-names></name><name><surname>Clum</surname><given-names>A</given-names></name><name><surname>Dunn</surname><given-names>C</given-names></name><name><surname>O&#x0027;Malley</surname><given-names>R</given-names></name><name><surname>Figueroa-Balderas</surname><given-names>R</given-names></name><name><surname>Morales-Cruz</surname><given-names>A</given-names></name><etal/></person-group><article-title>Phased diploid genome assembly with single-molecule real-time sequencing</article-title><source>Nat Methods</source><volume>13</volume><fpage>1050</fpage><lpage>1054</lpage><year>2016</year><pub-id pub-id-type="doi">10.1038/nmeth.4035</pub-id><pub-id pub-id-type="pmid">27749838</pub-id></element-citation></ref>
<ref id="b39-mmr-0-0-11890"><label>39</label><element-citation publication-type="online"><collab collab-type="corp-author">Arima Genomics, Inc.</collab><article-title>Arima-HiC Mapping Pipeline. San Diego</article-title><year>2019</year><source>GitHub</source><uri>https://github.com/ArimaGenomics/mapping_pipeline/blob/master/Arima_Mapping_UserGuide_A160156_v02.pdf</uri></element-citation></ref>
<ref id="b40-mmr-0-0-11890"><label>40</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ghurye</surname><given-names>J</given-names></name><name><surname>Rhie</surname><given-names>A</given-names></name><name><surname>Walenz</surname><given-names>BP</given-names></name><name><surname>Schmitt</surname><given-names>A</given-names></name><name><surname>Selvaraj</surname><given-names>S</given-names></name><name><surname>Pop</surname><given-names>M</given-names></name><name><surname>Phillippy</surname><given-names>AM</given-names></name><name><surname>Koren</surname><given-names>S</given-names></name></person-group><article-title>Integrating Hi-C links with assembly graphs for chromosome-scale assembly</article-title><source>PLoS Comput Biol</source><volume>15</volume><fpage>e1007273</fpage><year>2019</year><pub-id pub-id-type="doi">10.1371/journal.pcbi.1007273</pub-id><pub-id pub-id-type="pmid">31433799</pub-id></element-citation></ref>
<ref id="b41-mmr-0-0-11890"><label>41</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gurevich</surname><given-names>A</given-names></name><name><surname>Saveliev</surname><given-names>V</given-names></name><name><surname>Vyahhi</surname><given-names>N</given-names></name><name><surname>Tesler</surname><given-names>G</given-names></name></person-group><article-title>QUAST: Quality assessment tool for genome assemblies</article-title><source>Bioinformatics</source><volume>29</volume><fpage>1072</fpage><lpage>1075</lpage><year>2013</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btt086</pub-id><pub-id pub-id-type="pmid">23422339</pub-id></element-citation></ref>
<ref id="b42-mmr-0-0-11890"><label>42</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Earl</surname><given-names>D</given-names></name><name><surname>Bradnam</surname><given-names>K</given-names></name><name><surname>St John</surname><given-names>J</given-names></name><name><surname>Darling</surname><given-names>A</given-names></name><name><surname>Lin</surname><given-names>D</given-names></name><name><surname>Fass</surname><given-names>J</given-names></name><name><surname>Yu</surname><given-names>HO</given-names></name><name><surname>Buffalo</surname><given-names>V</given-names></name><name><surname>Zerbino</surname><given-names>DR</given-names></name><name><surname>Diekhans</surname><given-names>M</given-names></name><etal/></person-group><article-title>Assemblathon 1: A competitive assessment of de novo short read assembly methods</article-title><source>Genome Res</source><volume>21</volume><fpage>2224</fpage><lpage>2241</lpage><year>2011</year><pub-id pub-id-type="doi">10.1101/gr.126599.111</pub-id><pub-id pub-id-type="pmid">21926179</pub-id></element-citation></ref>
<ref id="b43-mmr-0-0-11890"><label>43</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Seppey</surname><given-names>M</given-names></name><name><surname>Manni</surname><given-names>M</given-names></name><name><surname>Zdobnov</surname><given-names>EM</given-names></name></person-group><article-title>BUSCO: Assessing genome assembly and annotation completeness</article-title><source>Methods Mol Biol</source><volume>1962</volume><fpage>227</fpage><lpage>245</lpage><year>2019</year><pub-id pub-id-type="doi">10.1007/978-1-4939-9173-0_14</pub-id><pub-id pub-id-type="pmid">31020564</pub-id></element-citation></ref>
<ref id="b44-mmr-0-0-11890"><label>44</label><element-citation publication-type="online"><person-group person-group-type="author"><name><surname>Chu</surname><given-names>J</given-names></name></person-group><article-title>Jupiter Plot: A circos-based tool to visualize genome assembly consistency</article-title><year>2018</year><source>GitHub</source><uri>https://github.com/JustinChu/JupiterPlot/find/master</uri></element-citation></ref>
<ref id="b45-mmr-0-0-11890"><label>45</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Krzywinski</surname><given-names>M</given-names></name><name><surname>Schein</surname><given-names>J</given-names></name><name><surname>Birol</surname><given-names>I</given-names></name><name><surname>Connors</surname><given-names>J</given-names></name><name><surname>Gascoyne</surname><given-names>R</given-names></name><name><surname>Horsman</surname><given-names>D</given-names></name><name><surname>Jones</surname><given-names>SJ</given-names></name><name><surname>Marra</surname><given-names>MA</given-names></name></person-group><article-title>Circos: An information aesthetic for comparative genomics</article-title><source>Genome Res</source><volume>19</volume><fpage>1639</fpage><lpage>1645</lpage><year>2009</year><pub-id pub-id-type="doi">10.1101/gr.092759.109</pub-id><pub-id pub-id-type="pmid">19541911</pub-id></element-citation></ref>
<ref id="b46-mmr-0-0-11890"><label>46</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kyriakidis</surname><given-names>K</given-names></name><name><surname>Charalampidou</surname><given-names>A</given-names></name><name><surname>Natsiavas</surname><given-names>P</given-names></name><name><surname>Vizirianakis</surname><given-names>IS</given-names></name><name><surname>Malousi</surname><given-names>A</given-names></name></person-group><article-title>Linking exome sequencing data with drug response aberrations</article-title><source>Stud Health Technol Inform</source><volume>264</volume><fpage>1845</fpage><lpage>1846</lpage><year>2019</year><pub-id pub-id-type="pmid">31438372</pub-id></element-citation></ref>
<ref id="b47-mmr-0-0-11890"><label>47</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname><given-names>X</given-names></name><name><surname>Ju</surname><given-names>X</given-names></name><name><surname>Yi</surname><given-names>X</given-names></name><name><surname>Zhu</surname><given-names>Q</given-names></name><name><surname>Qu</surname><given-names>N</given-names></name><name><surname>Liu</surname><given-names>T</given-names></name><name><surname>Chen</surname><given-names>Y</given-names></name><name><surname>Jiang</surname><given-names>H</given-names></name><name><surname>Yang</surname><given-names>G</given-names></name><name><surname>Zhen</surname><given-names>R</given-names></name><etal/></person-group><article-title>Identification of sequence variants in genetic disease-causing genes using targeted next-generation sequencing</article-title><source>PLoS One</source><volume>6</volume><fpage>e29500</fpage><year>2011</year><pub-id pub-id-type="doi">10.1371/journal.pone.0029500</pub-id><pub-id pub-id-type="pmid">22216297</pub-id></element-citation></ref>
<ref id="b48-mmr-0-0-11890"><label>48</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kanakoglou</surname><given-names>DS</given-names></name><name><surname>Michalettou</surname><given-names>TD</given-names></name><name><surname>Vasileiou</surname><given-names>C</given-names></name><name><surname>Gioukakis</surname><given-names>E</given-names></name><name><surname>Maneta</surname><given-names>D</given-names></name><name><surname>Kyriakidis</surname><given-names>KV</given-names></name><name><surname>Georgakilas</surname><given-names>AG</given-names></name><name><surname>Michalopoulos</surname><given-names>I</given-names></name></person-group><article-title>Effects of high-dose ionizing radiation in human gene expression: A meta-analysis</article-title><source>Int J Mol Sci</source><volume>21</volume><fpage>21</fpage><year>2020</year><pub-id pub-id-type="doi">10.3390/ijms21061938</pub-id></element-citation></ref>
<ref id="b49-mmr-0-0-11890"><label>49</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>DePristo</surname><given-names>MA</given-names></name><name><surname>Banks</surname><given-names>E</given-names></name><name><surname>Poplin</surname><given-names>R</given-names></name><name><surname>Garimella</surname><given-names>KV</given-names></name><name><surname>Maguire</surname><given-names>JR</given-names></name><name><surname>Hartl</surname><given-names>C</given-names></name><name><surname>Philippakis</surname><given-names>AA</given-names></name><name><surname>del Angel</surname><given-names>G</given-names></name><name><surname>Rivas</surname><given-names>MA</given-names></name><name><surname>Hanna</surname><given-names>M</given-names></name><etal/></person-group><article-title>A framework for variation discovery and genotyping using next-generation DNA sequencing data</article-title><source>Nat Genet</source><volume>43</volume><fpage>491</fpage><lpage>498</lpage><year>2011</year><pub-id pub-id-type="doi">10.1038/ng.806</pub-id><pub-id pub-id-type="pmid">21478889</pub-id></element-citation></ref>
<ref id="b50-mmr-0-0-11890"><label>50</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vollger</surname><given-names>MR</given-names></name><name><surname>Logsdon</surname><given-names>GA</given-names></name><name><surname>Audano</surname><given-names>PA</given-names></name><name><surname>Sulovari</surname><given-names>A</given-names></name><name><surname>Porubsky</surname><given-names>D</given-names></name><name><surname>Peluso</surname><given-names>P</given-names></name><name><surname>Wenger</surname><given-names>AM</given-names></name><name><surname>Concepcion</surname><given-names>GT</given-names></name><name><surname>Kronenberg</surname><given-names>ZN</given-names></name><name><surname>Munson</surname><given-names>KM</given-names></name><etal/></person-group><article-title>Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads</article-title><source>Ann Hum Genet</source><volume>84</volume><fpage>125</fpage><lpage>140</lpage><year>2020</year><pub-id pub-id-type="doi">10.1111/ahg.12364</pub-id><pub-id pub-id-type="pmid">31711268</pub-id></element-citation></ref>
<ref id="b51-mmr-0-0-11890"><label>51</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>Z</given-names></name><name><surname>Erickson</surname><given-names>DL</given-names></name><name><surname>Meng</surname><given-names>J</given-names></name></person-group><article-title>Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing</article-title><source>BMC Genomics</source><volume>21</volume><fpage>631</fpage><year>2020</year><pub-id pub-id-type="doi">10.1186/s12864-020-07041-8</pub-id><pub-id pub-id-type="pmid">32928108</pub-id></element-citation></ref>
<ref id="b52-mmr-0-0-11890"><label>52</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname><given-names>LK</given-names></name><name><surname>Sahasrabudhe</surname><given-names>R</given-names></name><name><surname>Gill</surname><given-names>JA</given-names></name><name><surname>Roach</surname><given-names>JL</given-names></name><name><surname>Froenicke</surname><given-names>L</given-names></name><name><surname>Brown</surname><given-names>CT</given-names></name><name><surname>Whitehead</surname><given-names>A</given-names></name></person-group><article-title>Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish</article-title><source>Gigascience</source><volume>9</volume><fpage>9</fpage><year>2020</year><pub-id pub-id-type="doi">10.1093/gigascience/giaa067</pub-id></element-citation></ref>
<ref id="b53-mmr-0-0-11890"><label>53</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Coombe</surname><given-names>L</given-names></name><name><surname>Zhang</surname><given-names>J</given-names></name><name><surname>Vandervalk</surname><given-names>BP</given-names></name><name><surname>Chu</surname><given-names>J</given-names></name><name><surname>Jackman</surname><given-names>SD</given-names></name><name><surname>Birol</surname><given-names>I</given-names></name><name><surname>Warren</surname><given-names>RL</given-names></name></person-group><article-title>ARKS: Chromosome-scale scaffolding of human genome drafts with linked read kmers</article-title><source>BMC Bioinformatics</source><volume>19</volume><fpage>234</fpage><year>2018</year><pub-id pub-id-type="doi">10.1186/s12859-018-2243-x</pub-id><pub-id pub-id-type="pmid">29925315</pub-id></element-citation></ref>
<ref id="b54-mmr-0-0-11890"><label>54</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yeo</surname><given-names>S</given-names></name><name><surname>Coombe</surname><given-names>L</given-names></name><name><surname>Warren</surname><given-names>RL</given-names></name><name><surname>Chu</surname><given-names>J</given-names></name><name><surname>Birol</surname><given-names>I</given-names></name></person-group><article-title>ARCS: Scaffolding genome drafts with linked reads</article-title><source>Bioinformatics</source><volume>34</volume><fpage>725</fpage><lpage>731</lpage><year>2018</year><pub-id pub-id-type="doi">10.1093/bioinformatics/btx675</pub-id><pub-id pub-id-type="pmid">29069293</pub-id></element-citation></ref>
<ref id="b55-mmr-0-0-11890"><label>55</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lam</surname><given-names>ET</given-names></name><name><surname>Hastie</surname><given-names>A</given-names></name><name><surname>Lin</surname><given-names>C</given-names></name><name><surname>Ehrlich</surname><given-names>D</given-names></name><name><surname>Das</surname><given-names>SK</given-names></name><name><surname>Austin</surname><given-names>MD</given-names></name><name><surname>Deshpande</surname><given-names>P</given-names></name><name><surname>Cao</surname><given-names>H</given-names></name><name><surname>Nagarajan</surname><given-names>N</given-names></name><name><surname>Xiao</surname><given-names>M</given-names></name><etal/></person-group><article-title>Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly</article-title><source>Nat Biotechnol</source><volume>30</volume><fpage>771</fpage><lpage>776</lpage><year>2012</year><pub-id pub-id-type="doi">10.1038/nbt.2303</pub-id><pub-id pub-id-type="pmid">22797562</pub-id></element-citation></ref>
<ref id="b56-mmr-0-0-11890"><label>56</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wallberg</surname><given-names>A</given-names></name><name><surname>Bunikis</surname><given-names>I</given-names></name><name><surname>Pettersson</surname><given-names>OV</given-names></name><name><surname>Mosbech</surname><given-names>MB</given-names></name><name><surname>Childers</surname><given-names>AK</given-names></name><name><surname>Evans</surname><given-names>JD</given-names></name><name><surname>Mikheyev</surname><given-names>AS</given-names></name><name><surname>Robertson</surname><given-names>HM</given-names></name><name><surname>Robinson</surname><given-names>GE</given-names></name><name><surname>Webster</surname><given-names>MT</given-names></name></person-group><article-title>A hybrid de novo genome assembly of the honeybee, <italic>Apis mellifera</italic>, with chromosome-length scaffolds</article-title><source>BMC Genomics</source><volume>20</volume><fpage>275</fpage><year>2019</year><pub-id pub-id-type="doi">10.1186/s12864-019-5642-0</pub-id><pub-id pub-id-type="pmid">30961563</pub-id></element-citation></ref>
<ref id="b57-mmr-0-0-11890"><label>57</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Crosswell</surname><given-names>LC</given-names></name><name><surname>Thornton</surname><given-names>JM</given-names></name></person-group><article-title>ELIXIR: A distributed infrastructure for European biological data</article-title><source>Trends Biotechnol</source><volume>30</volume><fpage>241</fpage><lpage>242</lpage><year>2012</year><pub-id pub-id-type="doi">10.1016/j.tibtech.2012.02.002</pub-id><pub-id pub-id-type="pmid">22417641</pub-id></element-citation></ref>
<ref id="b58-mmr-0-0-11890"><label>58</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kodama</surname><given-names>Y</given-names></name><name><surname>Shumway</surname><given-names>M</given-names></name><name><surname>Leinonen</surname><given-names>R</given-names></name><collab collab-type="corp-author">International Nucleotide Sequence Database Collaboration</collab></person-group><article-title>The sequence read archive: Explosive growth of sequencing data</article-title><source>Nucleic Acids Res</source><volume>40</volume><fpage>D54</fpage><lpage>D56</lpage><year>2012</year><pub-id pub-id-type="doi">10.1093/nar/gkr854</pub-id><pub-id pub-id-type="pmid">22009675</pub-id></element-citation></ref>
<ref id="b59-mmr-0-0-11890"><label>59</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Leinonen</surname><given-names>R</given-names></name><name><surname>Sugawara</surname><given-names>H</given-names></name><name><surname>Shumway</surname><given-names>M</given-names></name><collab collab-type="corp-author">International Nucleotide Sequence Database Collaboration</collab></person-group><article-title>The sequence read archive</article-title><source>Nucleic Acids Res</source><volume>39</volume><fpage>D19</fpage><lpage>D21</lpage><year>2011</year><pub-id pub-id-type="doi">10.1093/nar/gkq1019</pub-id><pub-id pub-id-type="pmid">21062823</pub-id></element-citation></ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-mmr-0-0-11890" position="float">
<label>Figure 1.</label>
<caption><p>Pipeline stages and tools used in each step of the workflow.</p></caption>
<graphic xlink:href="mmr-23-04-11890-g00.tif"/>
</fig>
<fig id="f2-mmr-0-0-11890" position="float">
<label>Figure 2.</label>
<caption><p><italic>Drosophila virilis</italic> assemblies comparison. Hybrid assemblers, MaSuRCA (CABOG and Flye) and Wengan, used Illumina short reads and Nanopore long reads for the assembly, while Canu, a long read assembler utilised Nanopore long reads for the same purpose. SALSA improved contiguity in all assemblies.</p></caption>
<graphic xlink:href="mmr-23-04-11890-g01.tif"/>
</fig>
<fig id="f3-mmr-0-0-11890" position="float">
<label>Figure 3.</label>
<caption><p><italic>Drosophila melanogaster</italic> Hifiasm assemblies comparison. Hifiasm performed three different assemblies using PacBio Hifi long reads with different insert size (11 Kbp, 24 Kbp) and coverage (37&#x00D7;, 40&#x00D7;, 92&#x00D7;). A region in one of the two termini of chr 2L appears translocated in the assemblies produced by 11 Kbp insert size with 37&#x00D7; coverage and 24 Kbp insert size with 92&#x00D7; coverage. The same region appears deleted in the assembly produced by 24 Kbp insert size with 40&#x00D7; coverage prior to SALSA scaffolding and inverted in the same assembly with SALSA scaffolding.</p></caption>
<graphic xlink:href="mmr-23-04-11890-g02.tif"/>
</fig>
<fig id="f4-mmr-0-0-11890" position="float">
<label>Figure 4.</label>
<caption><p><italic>Drosophila melanogaster</italic> HiCanu assemblies comparison. HiCanu performed three different assemblies using PacBio Hifi long reads with different insert size (11 Kbp, 24 Kbp) and coverage (37&#x00D7;, 40&#x00D7;, 92&#x00D7;). Deletions of major regions or entire chromosomes can be found in all assemblies. Apparent duplications as of major parts of chr 3L in the assemblies produced by 24 Kbp insert size with 40&#x00D7; coverage are the results of phasing.</p></caption>
<graphic xlink:href="mmr-23-04-11890-g03.tif"/>
</fig>
<fig id="f5-mmr-0-0-11890" position="float">
<label>Figure 5.</label>
<caption><p><italic>Homo sapiens</italic> assemblies comparison. Wengan hybrid assembler used 34&#x00D7; Illumina short reads and 30&#x00D7; Nanopore long reads for the assembly, while Hifiasm used 16&#x00D7; PacBio Hifi long reads.</p></caption>
<graphic xlink:href="mmr-23-04-11890-g04.tif"/>
</fig>
<table-wrap id="tI-mmr-0-0-11890" position="float">
<label>Table I.</label>
<caption><p>ENA accessions and T2T links of primary sequencing data.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Organism</th>
<th align="center" valign="bottom">Genome size (Mbp)</th>
<th align="center" valign="bottom">Illumina paired-end sequencing (coverage)</th>
<th align="center" valign="bottom">Illumina Hi-C sequencing (coverage)</th>
<th align="center" valign="bottom">Nanopore reads (coverage)</th>
<th align="center" valign="bottom">PacBio/HiFi reads (coverage)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><italic>Drosophila virilis</italic></td>
<td align="center" valign="top">169</td>
<td align="left" valign="top">SRR1536175 (108&#x00D7;)</td>
<td align="left" valign="top">SRR7029394 (67&#x00D7;)</td>
<td align="left" valign="top">SRR7167958 (50&#x00D7;)</td>
</tr>
<tr>
<td align="left" valign="top"><italic>Drosophila melanogaster</italic></td>
<td align="center" valign="top">140</td>
<td/>
<td/>
<td/>
<td align="left" valign="top">SRR9969842 (37&#x00D7;), SRR10238607 (subsampled to 92&#x00D7;)</td>
</tr>
<tr>
<td align="left" valign="top"><italic>Homo sapiens</italic></td>
<td align="center" valign="top">3,200</td>
<td align="left" valign="top">SRR3189741 SRR3189742 (Combined and subsampled to 34&#x00D7;)</td>
<td align="left" valign="top"><uri xlink:href="https://github.com/nanopore-wgs-consortium/CHM13#hi-c-data">https://github.com/nanopore-wgs-consortium/CHM13#hi-c-data</uri> (40&#x00D7;)</td>
<td align="left" valign="top"><uri xlink:href="https://github.com/nanopore-wgs-consortium/CHM13#oxford-nanopore-data">https://github.com/nanopore-wgs-consortium/CHM13#oxford-nanopore-data</uri> &#x00A0;&#x00A0;(Subsampled to 30&#x00D7;)</td>
<td align="left" valign="top">SRR11292120 SRR11292121 SRR11292122 SRR11292123 (Combined and subsampled to 16&#x00D7;)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn1-mmr-0-0-11890"><p>In some cases, where more than one FASTQ files was used, the files were combined and randomly subsampled to lower coverages. Mbp, Megabase pairs.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="tII-mmr-0-0-11890" position="float">
<label>Table II.</label>
<caption><p>Reference genomes used for the evaluation of the assemblies.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Organisms</th>
<th align="center" valign="bottom">Reference genomes</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><italic>Drosophila virilis</italic></td>
<td align="left" valign="top">GCA_007989325.1_vir160_genomic.fna</td>
</tr>
<tr>
<td/>
<td align="left" valign="top"><uri xlink:href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/007/989/325/GCA_007989325.1_vir160/">https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/007/989/325/GCA_007989325.1_vir160/</uri></td>
</tr>
<tr>
<td align="left" valign="top"><italic>Drosophila melanogaster</italic></td>
<td align="left" valign="top">GCA_002300595.1_Dmel_A4_1.0_genomic.fna</td>
</tr>
<tr>
<td/>
<td align="left" valign="top"><uri xlink:href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/300/595/GCA_002300595.1_Dmel_A4_1.0/">https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/300/595/GCA_002300595.1_Dmel_A4_1.0/</uri></td>
</tr>
<tr>
<td align="left" valign="top"><italic>Homo sapiens</italic></td>
<td align="left" valign="top">chm13.draft_v1.0.fasta</td>
</tr>
<tr>
<td/>
<td align="left" valign="top"><uri xlink:href="https://s3.amazonaws.com/nanopore-human-wgs/chm13/assemblies/chm13.draft_v1.0.fasta.gz">https://s3.amazonaws.com/nanopore-human-wgs/chm13/assemblies/chm13.draft_v1.0.fasta.gz</uri></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tIII-mmr-0-0-11890" position="float">
<label>Table III.</label>
<caption><p>Metrics of <italic>Drosophila</italic> assemblies.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Assemblers</th>
<th align="center" valign="bottom">Contigs/scaffolds</th>
<th align="center" valign="bottom">Genome assembly size (bp)</th>
<th align="center" valign="bottom">N50</th>
<th align="center" valign="bottom">NG50</th>
<th align="center" valign="bottom">L50</th>
<th align="center" valign="bottom">LG50</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">MaSuRCA (CABOG)</td>
<td align="center" valign="top">1,016</td>
<td align="center" valign="top">167,374,624</td>
<td align="center" valign="top">366,859</td>
<td align="center" valign="top">359,873</td>
<td align="center" valign="top">127</td>
<td align="center" valign="top">131</td>
</tr>
<tr>
<td align="left" valign="top">MaSuRCA (CABOG)/SALSA (Arima)</td>
<td align="center" valign="top">532</td>
<td align="center" valign="top">167,617,624</td>
<td align="center" valign="top">3,400,369</td>
<td align="center" valign="top">3,400,369</td>
<td align="center" valign="top">15</td>
<td align="center" valign="top">15</td>
</tr>
<tr>
<td align="left" valign="top">MaSuRCA (Flye)</td>
<td align="center" valign="top">689</td>
<td align="center" valign="top">163,000,738</td>
<td align="center" valign="top">419,467</td>
<td align="center" valign="top">406,899</td>
<td align="center" valign="top">113</td>
<td align="center" valign="top">121</td>
</tr>
<tr>
<td align="left" valign="top">MaSuRCA (Flye)/SALSA (Arima)</td>
<td align="center" valign="top">230</td>
<td align="center" valign="top">163,230,238</td>
<td align="center" valign="top">5,261,864</td>
<td align="center" valign="top">5,258,634</td>
<td align="center" valign="top">9</td>
<td align="center" valign="top">10</td>
</tr>
<tr>
<td align="left" valign="top">Wengan</td>
<td align="center" valign="top">329</td>
<td align="center" valign="top">153,989,049</td>
<td align="center" valign="top">3,232,846</td>
<td align="center" valign="top">3,013,042</td>
<td align="center" valign="top">13</td>
<td align="center" valign="top">16</td>
</tr>
<tr>
<td align="left" valign="top">Wengan/SALSA (Arima)</td>
<td align="center" valign="top">229</td>
<td align="center" valign="top">154,046,842</td>
<td align="center" valign="top">21,036,706</td>
<td align="center" valign="top">16,232,289</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">4</td>
</tr>
<tr>
<td align="left" valign="top">Canu</td>
<td align="center" valign="top">425</td>
<td align="center" valign="top">169,315,961</td>
<td align="center" valign="top">4,435,749</td>
<td align="center" valign="top">4,435,749</td>
<td align="center" valign="top">10</td>
<td align="center" valign="top">10</td>
</tr>
<tr>
<td align="left" valign="top">Canu/SALSA (Arima)</td>
<td align="center" valign="top">488</td>
<td align="center" valign="top">176,029,265</td>
<td align="center" valign="top">25,182,285</td>
<td align="center" valign="top">25,182,285</td>
<td align="center" valign="top">4</td>
<td align="center" valign="top">4</td>
</tr>
<tr>
<td align="left" valign="top">Hifiasm</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td align="center" valign="top">314</td>
<td align="center" valign="top">149,971,598</td>
<td align="center" valign="top">23,693,975</td>
<td align="center" valign="top">23,693,975</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td align="center" valign="top">149</td>
<td align="center" valign="top">164,010,561</td>
<td align="center" valign="top">21,707,601</td>
<td align="center" valign="top">24,110,342</td>
<td align="center" valign="top">4</td>
<td align="center" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td align="center" valign="top">186</td>
<td align="center" valign="top">169,871,295</td>
<td align="center" valign="top">23,943,049</td>
<td align="center" valign="top">24,211,538</td>
<td align="center" valign="top">4</td>
<td align="center" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 92&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">Hifiasm/SALSA</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td align="center" valign="top">308</td>
<td align="center" valign="top">149,976,098</td>
<td align="center" valign="top">23,693,975</td>
<td align="center" valign="top">23,693,975</td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td align="center" valign="top">141</td>
<td align="center" valign="top">164,015,561</td>
<td align="center" valign="top">24,110,342</td>
<td align="center" valign="top">24,620,248</td>
<td align="center" valign="top">4</td>
<td align="center" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td align="center" valign="top">183</td>
<td align="center" valign="top">169,876,757</td>
<td align="center" valign="top">23,943,049</td>
<td align="center" valign="top">24,211,538</td>
<td align="center" valign="top">4</td>
<td align="center" valign="top">3</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 92&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">HiCanu</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td align="center" valign="top">1,792</td>
<td align="center" valign="top">295,986,869</td>
<td align="center" valign="top">2,513,964</td>
<td align="center" valign="top">6,791,534</td>
<td align="center" valign="top">24</td>
<td align="center" valign="top">7</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td align="center" valign="top">1,024</td>
<td align="center" valign="top">322,211,690</td>
<td align="center" valign="top">6,752,429</td>
<td align="center" valign="top">17,694,921</td>
<td align="center" valign="top">12</td>
<td align="center" valign="top">4</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td align="center" valign="top">1,269</td>
<td align="center" valign="top">337,795,659</td>
<td align="center" valign="top">11,255,983</td>
<td align="center" valign="top">26,987,095</td>
<td align="center" valign="top">8</td>
<td align="center" valign="top">2</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 92&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">HiCanu/SALSA</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td align="center" valign="top">1,747</td>
<td align="center" valign="top">296,025,369</td>
<td align="center" valign="top">5,836,825</td>
<td align="center" valign="top">10,646,076</td>
<td align="center" valign="top">14</td>
<td align="center" valign="top">4</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td align="center" valign="top">1,023</td>
<td align="center" valign="top">322,224,690</td>
<td align="center" valign="top">12,833,112</td>
<td align="center" valign="top">30,402,815</td>
<td align="center" valign="top">7</td>
<td align="center" valign="top">2</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td align="center" valign="top">1,281</td>
<td align="center" valign="top">337,778,159</td>
<td align="center" valign="top">6,830,725</td>
<td align="center" valign="top">16,844,691</td>
<td align="center" valign="top">12</td>
<td align="center" valign="top">4</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 92&#x00D7;</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn2-mmr-0-0-11890"><p>Hybrid assemblies (MaSuRCA and Wengan) and long-read Nanopore assembly (Canu) were based on the <italic>Drosophila virilis</italic> genome (size: 169773245). HiFi PacBio assemblies (Hifiasm and HiCanu) were based on the <italic>Drosophila melanogaster</italic> genome (size: 145940863). Hifiasm and HiCanu assembles were performed using three combinations of insert data and coverage.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="tIV-mmr-0-0-11890" position="float">
<label>Table IV.</label>
<caption><p>BUSCO values of <italic>Drosophila</italic> assemblies.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Assemblers</th>
<th align="center" valign="bottom">Completed and single-copy BUSCOs (S)</th>
<th align="center" valign="bottom">Completed and duplicated BUSCOs (D)</th>
<th align="center" valign="bottom">Fragmented BUSCOs (F)</th>
<th align="center" valign="bottom">Missing BUSCOs (M)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><italic>Drosophila virilis</italic> reference genome</td>
<td align="center" valign="top">98.0&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.8&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">MaSuRCA (CABOG)</td>
<td align="center" valign="top">96.1&#x0025;</td>
<td align="center" valign="top">1.5&#x0025;</td>
<td align="center" valign="top">0.8&#x0025;</td>
<td align="center" valign="top">1.6&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">MaSuRCA (CABOG)/SALSA (Arima)</td>
<td align="center" valign="top">96.1&#x0025;</td>
<td align="center" valign="top">1.4&#x0025;</td>
<td align="center" valign="top">0.8&#x0025;</td>
<td align="center" valign="top">1.7&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">MaSuRCA (Flye)</td>
<td align="center" valign="top">98.2&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.8&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">MaSuRCA (Flye)/SALSA (Arima)</td>
<td align="center" valign="top">98.0&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.8&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">Wengan</td>
<td align="center" valign="top">98.0&#x0025;</td>
<td align="center" valign="top">0.4&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.9&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">Wengan/SALSA (Arima)</td>
<td align="center" valign="top">97.9&#x0025;</td>
<td align="center" valign="top">0.3&#x0025;</td>
<td align="center" valign="top">0.8&#x0025;</td>
<td align="center" valign="top">1.0&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">Canu</td>
<td align="center" valign="top">62.7&#x0025;</td>
<td align="center" valign="top">0.2&#x0025;</td>
<td align="center" valign="top">21.3&#x0025;</td>
<td align="center" valign="top">15.8&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">Canu/SALSA (Arima)</td>
<td align="center" valign="top">64.0&#x0025;</td>
<td align="center" valign="top">0.3&#x0025;</td>
<td align="center" valign="top">20.7&#x0025;</td>
<td align="center" valign="top">15.0&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top"><italic>Drosophila melanogaster</italic> reference genome</td>
<td align="center" valign="top">97.9&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.9&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">Hifiasmx</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td align="center" valign="top">98.1&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td align="center" valign="top">98.2&#x0025;</td>
<td align="center" valign="top">0.4&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 90&#x00D7;</td>
<td align="center" valign="top">98.1&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">Hifiasm/SALSA</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td align="center" valign="top">98.1&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td align="center" valign="top">98.2&#x0025;</td>
<td align="center" valign="top">0.4&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 90&#x00D7;</td>
<td align="center" valign="top">98.2&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">HiCanu</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td align="center" valign="top">4.8&#x0025;</td>
<td align="center" valign="top">94.1&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td align="center" valign="top">3.8&#x0025;</td>
<td align="center" valign="top">95.2&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 90&#x00D7;</td>
<td align="center" valign="top">3.2&#x0025;</td>
<td align="center" valign="top">95.5&#x0025;</td>
<td align="center" valign="top">0.7&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">HiCanu/SALSA</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td align="center" valign="top">42.3&#x0025;</td>
<td align="center" valign="top">56.7&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td align="center" valign="top">37.3&#x0025;</td>
<td align="center" valign="top">61.6&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 90&#x00D7;</td>
<td align="center" valign="top">39.0&#x0025;</td>
<td align="center" valign="top">59.9&#x0025;</td>
<td align="center" valign="top">0.5&#x0025;</td>
<td align="center" valign="top">0.6&#x0025;</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tV-mmr-0-0-11890" position="float">
<label>Table V.</label>
<caption><p>Assembly time and CPU usage comparison.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Organism</th>
<th align="center" valign="bottom">Assemblers</th>
<th align="center" valign="bottom">CPU time (sec)</th>
<th align="center" valign="bottom">CPU usage</th>
<th align="center" valign="bottom">Elapsed (wall clock) time (h:mm:ss)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><italic>Drosophila virilis</italic></td>
<td align="left" valign="top">MaSuRCA (CABOG)</td>
<td align="center" valign="top">1,638,637.72</td>
<td align="center" valign="top">3,954&#x0025;</td>
<td align="center" valign="top">11:30:39</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">MaSuRCA (Flye)</td>
<td align="center" valign="top">1,344,633.10</td>
<td align="center" valign="top">3,961&#x0025;</td>
<td align="center" valign="top">9:25:44</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">Canu</td>
<td align="center" valign="top">993,441,898</td>
<td align="center" valign="top">3,532&#x0025;</td>
<td align="center" valign="top">78:07:27</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">Wengan</td>
<td align="center" valign="top">198,241.94</td>
<td align="center" valign="top">2,831&#x0025;</td>
<td align="center" valign="top">1:56:42</td>
</tr>
<tr>
<td align="left" valign="top"><italic>Drosophila melanogaster</italic></td>
<td align="left" valign="top">Hifiasm</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td align="center" valign="top">163,816.92</td>
<td align="center" valign="top">4,098&#x0025;</td>
<td align="center" valign="top">1:06:37</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td align="center" valign="top">215,855.05</td>
<td align="center" valign="top">4,287&#x0025;</td>
<td align="center" valign="top">1:23:54</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 90&#x00D7;</td>
<td align="center" valign="top">4,271,030.94</td>
<td align="center" valign="top">4,313&#x0025;</td>
<td align="center" valign="top">25:40:58</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">HiCanu</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 11 Kbp</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 37&#x00D7;</td>
<td align="center" valign="top">85,224.85</td>
<td align="center" valign="top">1,752&#x0025;</td>
<td align="center" valign="top">1:21:03</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 40&#x00D7;</td>
<td align="center" valign="top">107,146.65</td>
<td align="center" valign="top">2,235&#x0025;</td>
<td align="center" valign="top">1:19:53</td>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Insert size: 24 Kbp</td>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td align="left" valign="top">&#x00A0;&#x00A0;Coverage: 90&#x00D7;</td>
<td align="center" valign="top">176,649.77</td>
<td align="center" valign="top">1,646&#x0025;</td>
<td align="center" valign="top">2:58:46</td>
</tr>
<tr>
<td align="left" valign="top"><italic>Homo sapiens</italic></td>
<td align="left" valign="top">Hifiasm</td>
<td align="center" valign="top">1,272,271.15</td>
<td align="center" valign="top">4,113&#x0025;</td>
<td align="center" valign="top">8:35:29</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tVI-mmr-0-0-11890" position="float">
<label>Table VI.</label>
<caption><p><italic>Homo sapiens</italic> assembly metrics.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Assemblers</th>
<th align="center" valign="bottom">Contigs/scaffolds</th>
<th align="center" valign="bottom">Genome assembly size (bp)</th>
<th align="center" valign="bottom">N50</th>
<th align="center" valign="bottom">NG50</th>
<th align="center" valign="bottom">L50</th>
<th align="center" valign="bottom">LG50</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Reference</td>
<td align="center" valign="top">24</td>
<td align="center" valign="top">3,056,916,522</td>
<td align="center" valign="top">154,259,625</td>
<td/>
<td align="center" valign="top">&#x00A0;&#x00A0;8</td>
<td/>
</tr>
<tr>
<td align="left" valign="top">Wengan</td>
<td align="center" valign="top">2,000</td>
<td align="center" valign="top">2,845,883,522</td>
<td align="center" valign="top">39,733,923</td>
<td align="center" valign="top">36,783,291</td>
<td align="center" valign="top">23</td>
<td align="center" valign="top">26</td>
</tr>
<tr>
<td align="left" valign="top">Wengan/SALSA (Arima)</td>
<td align="center" valign="top">1,689</td>
<td align="center" valign="top">2,845,883,522</td>
<td align="center" valign="top">59,573,195</td>
<td align="center" valign="top">56,310,190</td>
<td align="center" valign="top">15</td>
<td align="center" valign="top">17</td>
</tr>
<tr>
<td align="left" valign="top">Hifiasm</td>
<td align="center" valign="top">498</td>
<td align="center" valign="top">3,045,796,332</td>
<td align="center" valign="top">45,256,540</td>
<td align="center" valign="top">45,256,540</td>
<td align="center" valign="top">20</td>
<td align="center" valign="top">20</td>
</tr>
<tr>
<td align="left" valign="top">Hifiasm/SALSA (Arima)</td>
<td align="center" valign="top">431</td>
<td align="center" valign="top">3,045,840,332</td>
<td align="center" valign="top">61,206,687</td>
<td align="center" valign="top">61,206,687</td>
<td align="center" valign="top">15</td>
<td align="center" valign="top">15</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</article>
