<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xml:lang="en" article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">IJMM</journal-id>
<journal-title-group>
<journal-title>International Journal of Molecular Medicine</journal-title></journal-title-group>
<issn pub-type="ppub">1107-3756</issn>
<issn pub-type="epub">1791-244X</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3892/ijmm.2017.2942</article-id>
<article-id pub-id-type="publisher-id">ijmm-39-05-1063</article-id>
<article-categories>
<subj-group>
<subject>Articles</subject></subj-group></article-categories>
<title-group>
<article-title>Difficulty in obtaining the complete mRNA coding sequence at 5&#x02032; region (5&#x02032; end mRNA artifact): Causes, consequences in biology and medicine and possible solutions for obtaining the actual amino acid sequence of proteins (Review)</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Vitale</surname><given-names>Lorenza</given-names></name><xref rid="af1-ijmm-39-05-1063" ref-type="aff">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>Caracausi</surname><given-names>Maria</given-names></name><xref rid="af1-ijmm-39-05-1063" ref-type="aff">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>Casadei</surname><given-names>Raffaella</given-names></name><xref rid="af2-ijmm-39-05-1063" ref-type="aff">2</xref></contrib>
<contrib contrib-type="author">
<name><surname>Pelleri</surname><given-names>Maria Chiara</given-names></name><xref rid="af1-ijmm-39-05-1063" ref-type="aff">1</xref><xref ref-type="corresp" rid="c1-ijmm-39-05-1063"/></contrib>
<contrib contrib-type="author">
<name><surname>Piovesan</surname><given-names>Allison</given-names></name><xref rid="af1-ijmm-39-05-1063" ref-type="aff">1</xref></contrib></contrib-group>
<aff id="af1-ijmm-39-05-1063">
<label>1</label>Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, I-40126 Bologna</aff>
<aff id="af2-ijmm-39-05-1063">
<label>2</label>Department for Life Quality Studies, University of Bologna, I-47921 Rimini, Italy</aff>
<author-notes>
<corresp id="c1-ijmm-39-05-1063">Correspondence to: Dr Maria Chiara Pelleri, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, I-40126 Bologna (BO), Italy, E-mail: <email>mariachiara.pelleri2@unibo.it</email></corresp></author-notes>
<pub-date pub-type="ppub">
<month>05</month>
<year>2017</year></pub-date>
<pub-date pub-type="epub">
<day>06</day>
<month>04</month>
<year>2017</year></pub-date>
<volume>39</volume>
<issue>5</issue>
<fpage>1063</fpage>
<lpage>1071</lpage>
<history>
<date date-type="received">
<day>22</day>
<month>11</month>
<year>2016</year></date>
<date date-type="accepted">
<day>16</day>
<month>03</month>
<year>2017</year></date></history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017, Spandidos Publications</copyright-statement>
<copyright-year>2017</copyright-year></permissions>
<abstract>
<p>The known difficulty in obtaining the actual full length, complete sequence of a messenger RNA (mRNA) may lead to the erroneous determination of its coding sequence at the 5&#x02032; region (5&#x02032; end mRNA artifact), and consequently to the wrong assignment of the translation start codon, leading to the inaccurate prediction of the encoded polypeptide at its amino terminus. Among the known human genes whose study was affected by this artifact, we can include disco interacting protein 2 homolog A (<italic>DIP2A</italic>; <italic>KIAA0184</italic>), Down syndrome critical region 1 (<italic>DSCR1</italic>), SON DNA binding protein (<italic>SON</italic>), trefoil factor 3 (<italic>TFF3</italic>) and URB1 ribosome biogenesis 1 homolog (<italic>URB1</italic>; <italic>KIAA0539</italic>) on chromosome 21, as well as receptor for activated C kinase 1 (<italic>RACK1</italic>, also known as <italic>GNB2L1</italic>), glutaminyl-tRNA synthetase (<italic>QARS</italic>) and tyrosyl-DNA phosphodiesterase 2 (<italic>TDP2</italic>) along with another 474 loci, including interleukin 16 (<italic>IL16</italic>). In this review, we discuss the causes of this issue, its quantitative incidence in biomedical research, the consequences in biology and medicine, and the possible solutions for obtaining the actual amino acid sequence of proteins in the post-genomics era.</p></abstract>
<kwd-group>
<kwd>messenger RNA 5&#x02032; region</kwd>
<kwd>full-length cDNA</kwd>
<kwd>coding sequence</kwd>
<kwd>protein prediction</kwd>
<kwd>protein sequence</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="other">
<title>1. Introduction</title>
<p>Since the late 1990s, the availability of public, large databases containing growing information about genes, gene products (RNAs and proteins), genomes and molecular functions has radically changed the traditional approach to gene discovery and characterization. Combining the deposited data about informational molecules (<xref ref-type="bibr" rid="b1-ijmm-39-05-1063">1</xref>,<xref ref-type="bibr" rid="b2-ijmm-39-05-1063">2</xref>) obtained from multiple species is a straightforward method to gain rapid knowledge about the structure of an organism's genes and gene products, which in turn may be used to obtain clues as to the function of each individual gene. While this possibility has allowed the generation of an amount of data incomparable to what was obtained by classic molecular biology methods used in the pre-genomic era (<xref ref-type="bibr" rid="b3-ijmm-39-05-1063">3</xref>), the fact that the quality and degree of the information available for an individual gene may tend to decrease is less evident. For example, if we consider the characterization of the messenger RNA (mRNA) expressed by a human locus, through the 1980s and 1990s it was typical to obtain accurate information about the total mRNA size and tissue distribution by northern blot analysis and about the transcription initiation sites by S1 nuclease mapping, primer extension and run-off assays (<xref ref-type="bibr" rid="b4-ijmm-39-05-1063">4</xref>). In later years, mRNA full-length sequences were obtained by tailored experiments designed for polymerase chain reaction (PCR) amplification of DNA complementary to RNA (cDNA) ends &#x0005B;rapid amplification of cDNA ends (RACE)&#x0005D;, alternative splicing information by cDNA <italic>in vivo</italic> and <italic>in vitro</italic> cloning of individual RNA isoforms, and protein sequences by <italic>in vitro</italic> translation and polypeptide biochemical analysis. Indeed, genes were usually studied on a one-by-one basis, and there was the possibility to cross-check data made available through different methods (<xref ref-type="bibr" rid="b5-ijmm-39-05-1063">5</xref>). An example would be the comparison of the mRNA length deduced from northern blotting (taking into account the polyadenylated tail) and the one of the isolated cDNA (<xref ref-type="bibr" rid="b6-ijmm-39-05-1063">6</xref>), or the comparison of the molecular weight of a known protein (<xref ref-type="bibr" rid="b7-ijmm-39-05-1063">7</xref>) and the one of the polypeptide predicted to be encoded by the open reading frame (ORF)/coding sequence (CDS) of its relative cDNA.</p>
<p>New large-scale methods cannot always reach the resolution of previous ones; therefore, while they set a new standard in the methods used in genetics, more detailed analysis aimed at characterizing each individual gene remains necessary in order to avoid incomplete or erroneous knowledge of the gene structure and function. However, the genome-scale information has been in turn invaluable in effectively directing further investigations needed for each genomic locus using classical methods. This has been shown in particular for the human genome, by a large corpus of millions of short sequences (a few hundred base pairs in length) which has been derived by partial, single-pass sequencing of the cDNA clones from RNA of specific tissues (<xref ref-type="bibr" rid="b8-ijmm-39-05-1063">8</xref>). These have accumulated in the expressed sequence tag (EST) database since its creation &gt;20 years ago (<xref ref-type="bibr" rid="b9-ijmm-39-05-1063">9</xref>). A variety of EST-based methods (<xref ref-type="bibr" rid="b10-ijmm-39-05-1063">10</xref>,<xref ref-type="bibr" rid="b11-ijmm-39-05-1063">11</xref>) were then used for the rapid <italic>in silico</italic> cloning of genes (<xref ref-type="bibr" rid="b12-ijmm-39-05-1063">12</xref>), determining differential gene expression (<xref ref-type="bibr" rid="b13-ijmm-39-05-1063">13</xref>), characterizing alternative forms of transcripts derived from alternative splicing (<xref ref-type="bibr" rid="b14-ijmm-39-05-1063">14</xref>,<xref ref-type="bibr" rid="b15-ijmm-39-05-1063">15</xref>), and defining at least one complete ORF (<xref ref-type="bibr" rid="b16-ijmm-39-05-1063">16</xref>) for each mRNA. This last point is a well-known issue in molecular biology and genomics, with relevant consequences for the prediction of the gene product structure and function, and will be analyzed in detail in this review.</p></sec>
<sec sec-type="other">
<title>2. The 5&#x02032; end mRNA artifact</title>
<p>According to the classic molecular biology central dogma, the final effector of the genetic information is the protein (a chain of amino acids) encoded from a given gene; thus it is crucial to know the basic, primary structure of the protein (its amino acid sequence). A landmark in this field was the sequencing of the two amino acid chains composing human insulin by Sanger (<xref ref-type="bibr" rid="b17-ijmm-39-05-1063">17</xref>). Polypeptide sequencing has the advantage of determining the natural primary structure of the polypeptide chain, and in particular the actual first amino acid of the sequence, thanks to the ability of fluorodinitrobenzene to react with the N-terminal amino group at one end of the chain. Key subsequent advancements were the recognition that, due to the colinearity of nucleic acids and proteins and to the mechanisms of mRNA translation, amino terminal amino acids are encoded by the 5&#x02032; end of the mRNA (<xref ref-type="bibr" rid="b18-ijmm-39-05-1063">18</xref>). Therefore, when Sanger <italic>et al</italic> proposed a new effective method to sequence DNA (<xref ref-type="bibr" rid="b19-ijmm-39-05-1063">19</xref>), it became evident that it was much more convenient to sequence the nucleic acids rather than the proteins, and that the amino acid sequence of gene products could be conveniently deduced from the nucleotide sequence of the relative cloned cDNA. This change of experimental paradigm led to 'reverse genetics' (<xref ref-type="bibr" rid="b20-ijmm-39-05-1063">20</xref>), the passage from nucleic acid sequences to their functions rather than the contrary as in classic genetics and has had the fundamental consequence that actually, since the late 1970s, the vast majority of protein sequences were no longer directly determined, but were predicted following sequencing of the relative cDNAs according to rules for recognition of the start codon (first-AUG rule, optimal sequence context) and the genetic code (<xref ref-type="bibr" rid="b21-ijmm-39-05-1063">21</xref>).</p>
<p>While this advancement greatly sped up the pace of the availability of protein sequences, it should be kept in mind that all standard experimental methods for the cloning of cDNA are affected by a potential inability to effectively clone the 5&#x02032; region of mRNA in its completeness (<xref ref-type="bibr" rid="b22-ijmm-39-05-1063">22</xref>). This is due to the reverse transcriptase (RT) failure to extend first-strand cDNA along the full length of the mRNA template toward its 5&#x02032; end (<xref ref-type="bibr" rid="b22-ijmm-39-05-1063">22</xref>) (<xref rid="f1-ijmm-39-05-1063" ref-type="fig">Fig. 1</xref>), an operation whose success depends on the natural processivity of the enzyme, as well as its quality, the integrity of the RNA, the secondary structures assumed by the 5&#x02032; region of the mRNA hampering the RT progression and the reaction conditions (<xref ref-type="bibr" rid="b23-ijmm-39-05-1063">23</xref>).</p>
<p>It should be highlighted here that, due to the intrinsic functional mechanisms of the polymerases able to generate DNA copies of mRNAs, cDNA is typically obtained through a primer starting polymerization from the 3&#x02032; region of the mRNA &#x0005B;e.g., a poly(dT) oligonucleotide pairing with the poly(dA) tail present in the vast majority of mRNAs&#x0005D;. This implies that a cDNA collection is by definition enriched in the 3&#x02032; regions of the mRNAs, and consequently it is expected that the prediction of the amino acid sequence at the carboxy terminus of the gene product is more accurate than the one at the amino terminus. This problem was recognized early on, in the publication of the first sequenced human cDNA, the one for the &#x003B2; chain of hemoglobin in 1977 when the 5&#x02032;-untranslated region (UTR) was the last region to be reported in December (<xref ref-type="bibr" rid="b24-ijmm-39-05-1063">24</xref>) following previous descriptions of 3&#x02032;-UTR in April (<xref ref-type="bibr" rid="b25-ijmm-39-05-1063">25</xref>) and CDS in July (<xref ref-type="bibr" rid="b26-ijmm-39-05-1063">26</xref>): 'cloning cDNA has proven to be a most valuable technique for sequencing mRNA (<xref ref-type="bibr" rid="b27-ijmm-39-05-1063">27</xref>,<xref ref-type="bibr" rid="b28-ijmm-39-05-1063">28</xref>). During the construction of double-stranded cDNA, however, a considerable number of 5&#x02032;-non-coding region sequences are lost. The independent sequencing of this region will therefore be a necessary step to complete our knowledge of the primary structure of any mRNA' (<xref ref-type="bibr" rid="b24-ijmm-39-05-1063">24</xref>); Okayama and Berg clearly wrote in 1982: 'obtaining cloned cDNAs with complete 5&#x02032;-UTR and protein-coding sequences is rare, particularly if the mRNA codes for a large protein. Although such truncated cDNAs are still useful as hybridization probes, they cannot direct the synthesis of complete proteins after their introduction into bacterial or mammalian cells via appropriate expression vectors' (<xref ref-type="bibr" rid="b23-ijmm-39-05-1063">23</xref>).</p>
<p>A flourishing of reports in the 1980s presented the determination of the often called 'cDNA full-length sequence' for many human genes. For the reasons discussed, the concept of the 'full-length sequence' becomes <italic>de facto</italic> equivalent to the one of 'completeness of mRNA sequence at its 5&#x02032; end' and remains an open issue in molecular biology as cDNA incompletely representing the 5&#x02032; end of the relative mRNA may lead to the incorrect assignment of the first AUG codon. In these cases, should an additional upstream AUG - in frame with the previously determined one - have been identified in a more complete mRNA 5&#x02032; end, it would have been considered the actual translation start codon, thus extending the predicted amino terminus sequence of the product. Assignment of the inexact start codon leads to a series of subsequent relevant errors in the experimental study of the relative cDNA. We therefore introduced the term '5&#x02032; end mRNA artifact' to refer to the incorrect assignment of the first translation codon (AUG sequence) in an mRNA, due to the incomplete determination of its 5&#x02032; end sequence (<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>).</p>
<p>From the experimental point of view, the recognition of this technical issue, although often without systematic investigation of its possible consequences for genome annotation, has led to the development of several methods to determine the full length mRNA sequence on a large scale. Some were based on the presence of the 'cap' at the true 5&#x02032; end of the mRNA &#x0005B;reviewed in (<xref ref-type="bibr" rid="b30-ijmm-39-05-1063">30</xref>)&#x0005D;, such as 5&#x02032; cap trapping (<xref ref-type="bibr" rid="b31-ijmm-39-05-1063">31</xref>) and cap analysis of gene expression (CAGE) (<xref ref-type="bibr" rid="b32-ijmm-39-05-1063">32</xref>). Systematic empirical annotation of a set of transcript products by 5&#x02032; RACE (<xref ref-type="bibr" rid="b33-ijmm-39-05-1063">33</xref>) has also been employed, as well as after the introduction of microarray-based platforms, hybridization of RNA on high-density resolution tiling arrays (<xref ref-type="bibr" rid="b34-ijmm-39-05-1063">34</xref>). However, these techniques were found to be experimentally labor-intensive and they have not been routinely applied.</p>
<p>Concurrently, the growing incorporation of information derived from individual cDNA and large-scale sequencing projects, including those specifically designed to characterize mRNA 5&#x02032; end (<xref ref-type="bibr" rid="b31-ijmm-39-05-1063">31</xref>,<xref ref-type="bibr" rid="b35-ijmm-39-05-1063">35</xref>,<xref ref-type="bibr" rid="b36-ijmm-39-05-1063">36</xref>), led to a continuous refinement and improvement of completeness at the 5&#x02032; region of deposited and verified mRNA reference sequences (e.g., RefSeq, <ext-link xlink:href="https://www.ncbi.nlm.nih.gov/refseq/" ext-link-type="uri">https://www.ncbi.nlm.nih.gov/refseq/</ext-link>), as also regarding the corresponding protein-coding sequences. Therefore, it became possible to exploit the data from EST or other large-scale RNA sequencing projects to verify if sequence analysis could be optimized to reveal the extension of the 5&#x02032; region of known mRNAs and possibly the consequential redefinition of the amino acid sequence of the encoded products.</p>
<p>The recent availability of massive RNA-sequencing (RNA-Seq) methods for the generation of transcriptome sequence databases (<xref ref-type="bibr" rid="b37-ijmm-39-05-1063">37</xref>) offers a new potential tool to deal with the issue, although to date it appears not to have been systematically used to this aim. Moreover, information about sequences possibly extending the knowledge of the 5&#x02032; end of mRNA is not easily derivable from RNA-Seq data in comparison with the EST-based approach, due to short sequence reads typically obtained by this technique, as well as difficulty in building full-length transcript structures.</p>
<p>Furthermore, a ribosome footprinting profiling strategy based upon high-throughput sequencing of ribosome-protected mRNA fragments has been developed, enabling the genome-wide investigation of translation (<xref ref-type="bibr" rid="b38-ijmm-39-05-1063">38</xref>). This technique, used in combination with initiation-specific translation inhibitors, allows the identification of translation initiation with subcodon or even single-nucleotide resolution and was successfully exploited in order to predict also additional upstream AUG codons (<xref ref-type="bibr" rid="b39-ijmm-39-05-1063">39</xref>&#x02013;<xref ref-type="bibr" rid="b41-ijmm-39-05-1063">41</xref>).</p>
<p>Finally, we should note the existence of ORFs and out-of-frame AUGs located in the 5&#x02032;-UTR, upstream of the main coding region (<xref ref-type="bibr" rid="b42-ijmm-39-05-1063">42</xref>). These situations are different from the artifact reported herein as they do not extend the known coding region, but are implicated in the regulation of gene expression by modulating mRNA stability and translation (<xref ref-type="bibr" rid="b42-ijmm-39-05-1063">42</xref>,<xref ref-type="bibr" rid="b43-ijmm-39-05-1063">43</xref>).</p></sec>
<sec sec-type="other">
<title>3. Systematic identification of incomplete 5&#x02032; end region in human known mRNAs</title>
<p>The theoretical possibility that the presence of a more precise knowledge of the mRNA 5&#x02032; end sequence may lead to consequential correction of the previously accepted predicted product appeared in several reports in the form of anecdotal evidence randomly found for single genes that were under detailed investigation. For example, mRNA CDS was extended in this way for <italic>RANBP9</italic>/<italic>RanBPM</italic> gene (RAN binding protein 9, on 6p23), where the study performed by Nishitani <italic>et al</italic> (<xref ref-type="bibr" rid="b44-ijmm-39-05-1063">44</xref>) allowed the addition of 230 new amino acids. In the case of nuclear factor, erythroid 2-like 3 (<italic>NFE2L3</italic>) gene (on 7p15.2), the corresponding #AB010812.1 mRNA sequence of 2,174 bp in length derived from Kobayashi <italic>et al</italic> (<xref ref-type="bibr" rid="b45-ijmm-39-05-1063">45</xref>) was replaced by the sequence #AF134891.1 of 2,618 bp, leading to the addition of 294 new amino acids to the predicted protein. The study performed by Nomura <italic>et al</italic> (<xref ref-type="bibr" rid="b46-ijmm-39-05-1063">46</xref>) for <italic>SP2</italic> gene (Sp2 transcription factor, on 17q21.32) allowed the release of the #D28588.1 mRNA sequence entry recording a CDS of 3,288 bp leading to the addition of 111 new amino acids compared to the previous #M97190 entry of 2,063 bp provided by Kingsley and Winoto (<xref ref-type="bibr" rid="b47-ijmm-39-05-1063">47</xref>). The coding nature of these extensions was also supported by very high similarity with the respective murine hortologs (<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>). These and other similar reports suggested that a high-throughput approach was desirable to discover all the incompletenesses in the CDSs (<xref rid="tI-ijmm-39-05-1063" ref-type="table">Table I</xref>).</p>
<p>Regarding our group, as a first approach to the issue, due to our interest in an integrated route to identifying new pathogenesis-based therapeutic approaches for trisomy 21 (Down syndrome) (<xref ref-type="bibr" rid="b48-ijmm-39-05-1063">48</xref>,<xref ref-type="bibr" rid="b49-ijmm-39-05-1063">49</xref>), we focused on the known, well-characterized genes present in the original map of human chromosome 21 (Hsa21), manually analyzing 109 RefSeq mRNA sequences catalogued as 'category: known' by Hattori <italic>et al</italic> (<xref ref-type="bibr" rid="b50-ijmm-39-05-1063">50</xref>), and linked to at least one published report, for the presence of an in-frame stop codon upstream of the described ATG. In 49 cases, the finding of such a stop codon allowed the exclusion of the possibility that the recorded 5&#x02032;-UTR sequence may actually be part of a longer CDS (<xref ref-type="bibr" rid="b51-ijmm-39-05-1063">51</xref>). The sequence of the remaining 60 mRNAs in which bases in the 5&#x02032;-UTR could on the contrary be consistent with the presence of translated codons was systematically aligned with sequences available in databanks using Basic Local Alignment Search Tool (BLAST software, <ext-link xlink:href="http://www.ncbi.nih.gov/BLAST/" ext-link-type="uri">http://www.ncbi.nih.gov/BLAST/</ext-link>), leading to the discovery of a total of 20 genes for which EST (or also non-EST RNA sequences) homology suggested the existence of mRNAs more complete at 5&#x02032; terminus. They putatively encode for protein products longer at their amino terminus, due to the presence of a previously unknown start codon in frame with and upstream of the described one (<xref rid="f2-ijmm-39-05-1063" ref-type="fig">Fig. 2</xref>). Experimental evidence for the existence of these transcripts was finally obtained, following RT-PCR and sequencing, for five loci: down syndrome critical region 1 (<italic>DSCR1</italic>) &#x0005B;now regulator of calcineurin 1 (<italic>RCAN1</italic>)&#x0005D;, disco interacting protein 2 homolog A (<italic>DIP2A</italic>; <italic>KIAA0184</italic>), URB1 ribosome biogen-esis 1 homolog (<italic>URB1; KIAA0539I</italic>), SON DNA binding protein (<italic>SON</italic>) and trefoil factor 3 (<italic>TFF3</italic>) (<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>). In these cases, both of the following conditions occurred: an extension of described exon 1 predicted new coding codons upstream of the known AUG; and a novel AUG was present upstream of these codons, in frame with the previously described AUG and without any intervening stop codon. This thus suggests that, following the rules of translation initiation &#x0005B;reviewed by Kozak (<xref ref-type="bibr" rid="b21-ijmm-39-05-1063">21</xref>)&#x0005D;, the actual CDS should be considered as the one included between the novel 'first-AUG' and the known stop (<xref rid="f2-ijmm-39-05-1063" ref-type="fig">Fig. 2</xref>). It was observed that no known mechanism hampers the possibility that the newly identified start codon is not the point of actual translation as the use of 'internal' AUGs, enabling additional initiation events at downstream AUG codons in some mRNAs may occur only in three well-defined circumstances (<xref ref-type="bibr" rid="b21-ijmm-39-05-1063">21</xref>): re-initiation, which does not apply to the mRNAs investigated by this approach, as the newly determined AUG is not part of a small upstream ORF separated from the main ORF by a stop codon; context-dependent leaky scanning, which may be excluded as we considered the concordance with the Kozak sequence (<xref ref-type="bibr" rid="b21-ijmm-39-05-1063">21</xref>,<xref ref-type="bibr" rid="b52-ijmm-39-05-1063">52</xref>) for the novel AUGs, observing full (sometimes better) compatibility with the use of the novel AUG (<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>); a third mechanism, that is the use of internal ribosome entry site (IRES) sequence modules, adopted only by some known viral mRNAs.</p>
<p>These positive results suggested to extend the approach to the whole set of human RefSeq mRNAs known at the time (n=13,124), following automation by a simple program to detect the presence or the absence of an in-frame stop in the described 5&#x02032;-UTR of an mRNA. The percentage of the latter type of mRNA in the set (51%) was very similar to the one found for the Hsa21 gene set (55%), thus estimating that, in proportion, the CDS of 556 known human mRNAs might be incomplete at the 5&#x02032; end (<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>).</p>
<p>This approach required manual curation to analyze in detail, by sequence comparison, any mRNA candidate to have an incomplete CDS at 5&#x02032; region. An improvement of the algorithm was then published and applied with success to zebrafish &#x0005B;see below (<xref ref-type="bibr" rid="b53-ijmm-39-05-1063">53</xref>)&#x0005D;, showing that the automated detection of putative additional bases at the known 5&#x02032; end of a set of mRNAs following elaboration of multiple results of sequence comparison analysis (by BLAST tool) was possible. Some technical limitations of the used environment made the implementation of this pipeline difficult for the much more numerous human sequences which hampered progress in this direction for a while. Further improvement of the automated EST-based approach (5&#x02032;_ORF_Extender 2.0, freely available at <ext-link xlink:href="http://apollo11.isto.unibo.it/software/" ext-link-type="uri">http://apollo11.isto.unibo.it/software/</ext-link>) finally made the systematic identification (<xref rid="f2-ijmm-39-05-1063" ref-type="fig">Fig. 2</xref>) of CDSs at the 5&#x02032; end of all human known mRNAs possible, parsing &gt;7 million BLAT alignments and thus finding 477 human loci out of 18,665 analyzed (<xref rid="tI-ijmm-39-05-1063" ref-type="table">Table I</xref>), with an extension of their RNA 5&#x02032; CDS identified in detail (<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>). In addition, in this study, a proof-of-concept confirmation was obtained by <italic>in vitro</italic> cloning and sequencing for some example genes: <italic>GNB2L1</italic> &#x0005B;now receptor for activated C kinase 1 (<italic>RACK1</italic>)&#x0005D;, glutaminyl-tRNA synthetase (<italic>QARS</italic>) and tyrosyl-DNA phosphodiesterase 2 (<italic>TDP2</italic>) cDNAs. On the other hand, a list of 20,775 human mRNAs where the presence of an in-frame stop codon upstream of the known start codon indicates completeness of the CDS at 5&#x02032; end in the current form was generated (<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>). This approach could also be aimed at the different 5&#x02032;-UTR sequence identification, but the length of the bases aligned upstream of the novel AUG is usually too short to allow this type of investigation. In addition, should the length be long enough, the analysis would require an ad hoc algorithm able to discriminate mRNA isoforms of this type, including mapping of the newly determined 5&#x02032;-UTR to the genome to derive the alternative transcription/splicing events responsible for the different 5&#x02032;-UTR sequences.</p>
<p>While this review is more focused on human mRNAs for the possible repercussion in medicine, it should be noted that similar results are to be expected for the genomes of other organisms due to the sharing of common molecular techniques, whose limitations are at the basis of the artifact. Actually, studies on two of the most commonly used model organisms for the investigation of the human genome, <italic>Danio rerio</italic> (zebrafish) and <italic>Mus musculus</italic> (domestic mouse) have confirmed this expectation. A novel proposed automated approach (5&#x02032;_ORF_Extender 1.0) was able to systematically compare available ESTs with all the zebrafish experimentally determined mRNA sequences, identify additional sequence stretches at 5&#x02032; region and scan for the presence of all conditions needed to define a new, extended putative ORF. The tool identified 285 (3.3%) mRNAs with putatively incomplete ORFs at the 5&#x02032; region and, in three example selected cases (<italic>selt1a</italic>, <italic>unc119.2</italic> and <italic>nppa</italic> or selenoprotein T 1a, unc-119 lipid binding chaperone B homolog 2 and natriuretic peptide A, respectively), the extended coding region at 5&#x02032; end was experimentally demonstrated (<xref ref-type="bibr" rid="b53-ijmm-39-05-1063">53</xref>). As regards the mouse mRNAs, the application of the improved method used for human transcripts (<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>) showed that in 351 mouse loci, out of 20,221 analyzed, an extension of the mRNA 5&#x02032;-coding region could be identified. Experimental confirmation was obtained by <italic>in vitro</italic> cloning and sequencing for adenomatosis polyposis coli 2 (<italic>Apc2</italic>) and MAP kinase-interacting serine/threonine kinase 2 (<italic>Mknk2</italic>) cDNAs and a list of 16,330 mouse mRNAs with estimated complete CDS at 5&#x02032; end was provided (<xref ref-type="bibr" rid="b55-ijmm-39-05-1063">55</xref>). Remarkably, 82% of the results were original and have not been identified by the annotation pipelines used in the main mouse genome databases and genome browser (<xref ref-type="bibr" rid="b55-ijmm-39-05-1063">55</xref>). The diffusion of the 5&#x02032; end mRNA artifact may thus be considered approximately constant from lower vertebrates to humans because the methods used to characterize the relative mRNAs are the same or very similar (<xref rid="tI-ijmm-39-05-1063" ref-type="table">Table I</xref>).</p>
<p>The identification of the most upstream definable start codon does not exclude that a downstream AUG codon may also be used by the ribosome, a phenomenon known as alternative translation (<xref ref-type="bibr" rid="b56-ijmm-39-05-1063">56</xref>). It has been shown that alternative translation start sites tend to be conserved in eukaryotic genomes, providing a functional mechanism under selection for increased efficiency of translation and/or for obtainment of different N-terminal protein variants (<xref ref-type="bibr" rid="b57-ijmm-39-05-1063">57</xref>). It has also already been noted that this type of analysis cannot formally exclude that the extended ORF may derive from alternative transcription starting site (due to alternative promoter usage) and/or splicing of the investigated locus (<xref ref-type="bibr" rid="b53-ijmm-39-05-1063">53</xref>). However, it reveals in any case that additional coding sequences not previously identified exist, as may be confirmed by phylogenetic comparison at the amino acid level (<xref ref-type="bibr" rid="b53-ijmm-39-05-1063">53</xref>). As in the case of any other computer prediction, further investigation is required, <italic>in silico</italic> but especially <italic>in vitro</italic>, for a fine characterization of the putative model.</p>
<p>While the published approaches have considered algorithms assuming that the start codon has an AUG sequence, it should be noted that in a minor percentage of mRNA CDSs the start codon may have alternative sequences, particularly CUG, UUG, GUG, ACG, AUA and AUU (<xref ref-type="bibr" rid="b58-ijmm-39-05-1063">58</xref>). Actually, recent experiments have confirmed this phenomenon and suggested that it may be more frequent than was previously assumed. Therefore, when the use of a non-AUG codon is known or suspected, further analysis not included in standard pipelines should be performed in individual cases to identify in frame upstream non-AUG start codons which may also be responsible of encoding proteins longer that the ones previously described.</p></sec>
<sec sec-type="other">
<title>4. Consequences of 5&#x02032; end mRNA artifact in biology and medicine</title>
<p>The 5&#x02032; end mRNA artifact is expected, and demonstrated, to cause a chain of consequences in biomedical research, that will be now listed and discussed (<xref rid="tII-ijmm-39-05-1063" ref-type="table">Table II</xref>). The first obvious issue associated with the artifact is the negative consequence on the study of product structure and function (<xref ref-type="bibr" rid="b59-ijmm-39-05-1063">59</xref>). The possibility that vast amounts of studies are based on incorrect starting data is real. For instance, it occurred in the functional characterization of a polypeptide expressed from its predicted incomplete DNA (<xref ref-type="bibr" rid="b60-ijmm-39-05-1063">60</xref>) and in a functional study of the cytokine interleukin 16 (IL16) (<xref ref-type="bibr" rid="b61-ijmm-39-05-1063">61</xref>), where the product appears to be expressed from an incomplete cDNA (<xref rid="tII-ijmm-39-05-1063" ref-type="table">Table II</xref>).</p>
<p>The recording of protein sequences incomplete at their amino terminus in the genomic databases may also cause the failure to identify functionally remarkable protein domain sequences (<xref rid="tII-ijmm-39-05-1063" ref-type="table">Table II</xref>); in particular, sequences located at the amino terminus of proteins may be represented by signal peptide sequences directing delivery of the protein to its final destination (<xref ref-type="bibr" rid="b62-ijmm-39-05-1063">62</xref>,<xref ref-type="bibr" rid="b63-ijmm-39-05-1063">63</xref>) and may also affect its half-life (<xref ref-type="bibr" rid="b64-ijmm-39-05-1063">64</xref>).</p>
<p>In addition, there is the possibility to underestimate alternative splicing at the 5&#x02032; terminus of genes and to not predict the corresponding alternative protein gene products (<xref rid="tII-ijmm-39-05-1063" ref-type="table">Table II</xref>). The statement in the classic article by Okayama and Berg still holds true: 'indeed, it was comparison between cloned cDNAs and their genomic counterparts that uncovered the existence of intervening sequences and splicing' (<xref ref-type="bibr" rid="b23-ijmm-39-05-1063">23</xref>). Moreover, the design of a mutation screening aimed at identifying pathological variations in the coding sequences could be affected by the incomplete knowledge of the CDS, a circumstance that could occasionally explain the failure to find expected mutations in candidate or established disease genes, and could possibly lead to inaccurate genotype/phenotype correlations (<xref rid="tII-ijmm-39-05-1063" ref-type="table">Table II</xref>). From a functional point of view, the new amino acid sequence could be responsible for new interactions. The possibility of designing molecules with pharmacological activity based on binding to proteins expressed as bait in a two-hybrid test from incomplete cDNAs (<xref ref-type="bibr" rid="b65-ijmm-39-05-1063">65</xref>) emphasizes the importance of knowing the actual primary structure of the protein. Finally, the presence of a truncated protein sequence in the genomic databases may also be at the origin of a chain of errors in the prediction of orthologs in other species. In particular, the genome annotation pipelines will tend to propagate the truncated sequence in the predicted model proteins. For instance, the error in determination of the highly similar corresponding murine DSCR1 ortholog (<xref ref-type="bibr" rid="b66-ijmm-39-05-1063">66</xref>) underlines that a bias deriving from the original human incomplete data negatively affected the modeling of the murine DSCR1 product sequence.</p>
<p>Due to the complex structure of the human loci (<xref ref-type="bibr" rid="b67-ijmm-39-05-1063">67</xref>&#x02013;<xref ref-type="bibr" rid="b70-ijmm-39-05-1063">70</xref>), errors in establishing an accurate cDNA sequence may also finally cause drawbacks in the study of genomic organization of a gene due to the tight connections between DNA and RNA (<xref rid="tII-ijmm-39-05-1063" ref-type="table">Table II</xref>). If a cDNA incomplete at its 5&#x02032; terminus is used to establish the genomic structure of a locus, this could cause failure to recognize genomic sequences as part of the locus (<xref ref-type="bibr" rid="b71-ijmm-39-05-1063">71</xref>). As a secondary consequence, classification of a genic region as intergenic may keep the 'search space' for novel genes artificially expanded (<xref ref-type="bibr" rid="b71-ijmm-39-05-1063">71</xref>). Due to the physical proximity of the gene promoter region and the corresponding mRNA 5&#x02032; region, a sequence supposed to be proximal to the transcription start site and annotated as promoter could be actually part of a longer mRNA, as was shown for <italic>TFF3</italic> (<xref ref-type="bibr" rid="b72-ijmm-39-05-1063">72</xref>,<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>). This issue may further increase the difficulty in identifying promoter sequences that do not have regular start and stop signals or characteristic cross-species conservations as the CDSs, and can even present with diverged sequences among distant species, while being functionally conserved (<xref ref-type="bibr" rid="b73-ijmm-39-05-1063">73</xref>). On the other hand, a non-exact delimitation between 5&#x02032;-UTR and CDS could lead to errors in the knowledge of the 5&#x02032;-UTR sequence itself and in the interpretation of its role in the control of translation (<xref ref-type="bibr" rid="b74-ijmm-39-05-1063">74</xref>). Although this last class of consequences does not directly affect the prediction of the CDS, they should be considered as a further incentive to not underestimate the relevance of this artifact due to the central role of the 5&#x02032; terminus in gene expression regulation pathways. The knowledge of the true mRNA end is also useful in designing probes specific for this region that may be more variable between similar loci or isoforms from the same locus rather than the central, coding region. This is relevant regarding the possibility to extract from publicly available microarray datasets quantitative reference measures for the expression values of the whole complement of the genes of both normal (<xref ref-type="bibr" rid="b75-ijmm-39-05-1063">75</xref>) or pathologic (<xref ref-type="bibr" rid="b76-ijmm-39-05-1063">76</xref>) transcriptomes. Exact knowledge of mRNA 5&#x02032; region also affects the choice of morpholino oligonucleotides, in particular in zebrafish (<xref ref-type="bibr" rid="b77-ijmm-39-05-1063">77</xref>), used in knockdown experiments (<xref rid="tII-ijmm-39-05-1063" ref-type="table">Table II</xref>).</p>
<p>The artifact may also be a source for errors in other types of genomic analysis, although in these cases the consequences are expected not to be relevant, as the alteration of calculations is likely to represent a small deviation, and not for immediate medical application of these analyses &#x0005B;e.g., estimation of codon usage at a genomic scale (<xref ref-type="bibr" rid="b78-ijmm-39-05-1063">78</xref>), although the knowledge of the whole set of codons in a cDNA could affect the technology of the production of the translated product in a host (<xref ref-type="bibr" rid="b79-ijmm-39-05-1063">79</xref>)&#x0005D;.</p></sec>
<sec sec-type="other">
<title>5. Possible solutions for improving the knowledge of the 5&#x02032;-coding regions in mRNAs</title>
<p>Several methods have been described with the aim of knowing with more precision the 5&#x02032; mRNA end, thus excluding that its CDS may be incompletely predicted. The first were devised in the 1990s and were based on experimental protocols exploiting the capability of dedicated techniques to identify the first bases transcribed from DNA or the first bases following the cap in mature mRNAs. These methods have been cited in the Introduction section and remain valid, although they were often labor-intensive and not routinely used.</p>
<p>A second group of methods is based on computational biology approaches, with the advantage of providing a first systematic screening leading to exclusion of a relevant number of genes as candidates for the 5&#x02032; end mRNA artifact. Due to the availability of throughput results of an EST-based approach of this type (<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>), it is advisable to perform a simple first check against these results for a gene of interest before assuming that the predicted product is the one recorded in the current version of databases. Continuous refinement over time of the human mRNA sequences has led to the current estimation of 259 nucleotides as the mean 5&#x02032;-UTR size (<xref ref-type="bibr" rid="b80-ijmm-39-05-1063">80</xref>), so there is the concrete possibility that extended protein-coding sequences could actually be hidden in longer 5&#x02032;-UTRs. Further developments of the computational analysis of high-throughput cDNA sequencing methods (RNA-Seq) should also provide a means to increase the characterization of whole sequences of human transcripts. Several studies have been performed to implement RNA-Seq methods of profiling mRNA 5&#x02032; ends in <italic>Drosophila melanogaster</italic> (<xref ref-type="bibr" rid="b81-ijmm-39-05-1063">81</xref>,<xref ref-type="bibr" rid="b82-ijmm-39-05-1063">82</xref>).</p>
<p>Finally, recent developments of proteomics research open the way for a different, specular approach to the problem. Knowledge of protein sequences obtained by massive analysis of polypeptide nuclear magnetic resonance (NMR) or mass spectrometry (MS) spectra, in particular oriented to N-terminal sequencing (<xref ref-type="bibr" rid="b83-ijmm-39-05-1063">83</xref>,<xref ref-type="bibr" rid="b84-ijmm-39-05-1063">84</xref>), might be used for a reverse search for genomic sequences predicted to be translated in the corresponding identified protein sequences. This thus resembles the first protein-toward-DNA experimental flow but at on a genomic scale and largely based on computational methods.</p>
<p>In conclusion, we have presented evidence that current methods of genomics research are subject to a possible artifact regarding the exact determination of the mRNA 5&#x02032; region sequence and the consequences that this may have on the annotation, as well as on the experimental study of both genes and gene products. While there are several strategies to deal with this issue, the most important issue appears to bring this possibility to the attention of the scientific community so that it is taken into account when planning experiments in molecular biology and genetics.</p></sec></body>
<back>
<ack>
<title>Acknowledgments</title>
<p>M.C.'s fellowship has been co-funded by a donation from Fondazione Umano Progresso (Milano, Italy) and by a grant from Fondazione del Monte di Bologna e Ravenna (Bologna, Italy). M.C.P.'s fellowship has been co-funded by a donation from Fondazione Umano Progresso and by donations following the international fundraising initiative by Vittoria Aiello and Massimiliano Albanese (Washington, DC, USA) - donors contributing to this initiative are listed on the site: <ext-link xlink:href="http://www.massimilianoalbanese.net/ds-research/?lang=en" ext-link-type="uri">http://www.massimilianoalbanese.net/ds-research/?lang=en</ext-link>. The fellowship for A.P. has been mainly funded by the Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna (Bologna, Italy) and co-funded by the Fondazione Umano Progresso. We are grateful to Kirsten Welter for her expert revision of the manuscript.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-ijmm-39-05-1063"><label>1</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Borsani</surname><given-names>G</given-names></name><name><surname>Ballabio</surname><given-names>A</given-names></name><name><surname>Banfi</surname><given-names>S</given-names></name></person-group><article-title>A practical guide to orient yourself in the labyrinth of genome databases</article-title><source>Hum Mol Genet</source><volume>7</volume><fpage>1641</fpage><lpage>1648</lpage><year>1998</year><pub-id pub-id-type="doi">10.1093/hmg/7.10.1641</pub-id><pub-id pub-id-type="pmid">9735386</pub-id></element-citation></ref>
<ref id="b2-ijmm-39-05-1063"><label>2</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pandey</surname><given-names>A</given-names></name><name><surname>Lewitter</surname><given-names>F</given-names></name></person-group><article-title>Nucleotide sequence databases: A gold mine for biologists</article-title><source>Trends Biochem Sci</source><volume>24</volume><fpage>276</fpage><lpage>280</lpage><year>1999</year><pub-id pub-id-type="doi">10.1016/S0968-0004(99)01400-0</pub-id><pub-id pub-id-type="pmid">10390617</pub-id></element-citation></ref>
<ref id="b3-ijmm-39-05-1063"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Baxevanis</surname><given-names>AD</given-names></name><name><surname>Bateman</surname><given-names>A</given-names></name></person-group><article-title>The importance of biological databases in biological discovery</article-title><source>Curr Protoc Bioinformatics</source><volume>50</volume><fpage>1.1.1</fpage><lpage>1.1.8</lpage><year>2015</year><pub-id pub-id-type="doi">10.1002/0471250953.bi0101s50</pub-id></element-citation></ref>
<ref id="b4-ijmm-39-05-1063"><label>4</label><element-citation publication-type="book"><person-group person-group-type="editor"><name><surname>Tropp</surname><given-names>BE</given-names></name></person-group><source>Molecular Biology: Genes to Proteins</source><edition>3rd edition</edition><publisher-name>Jones &amp; Bartlett</publisher-name><publisher-loc>Publishers, Sudbury, MA</publisher-loc><year>2008</year></element-citation></ref>
<ref id="b5-ijmm-39-05-1063"><label>5</label><element-citation publication-type="book"><person-group person-group-type="editor"><name><surname>Sambrook</surname><given-names>J</given-names></name><name><surname>Russel</surname><given-names>DW</given-names></name></person-group><source>Molecular Cloning: A Laboratory Manual</source><volume>2</volume><edition>3rd edition</edition><publisher-name>Cold Spring Harbor Laboratory Press, Cold Spring Harbor</publisher-name><publisher-loc>NY</publisher-loc><year>2001</year></element-citation></ref>
<ref id="b6-ijmm-39-05-1063"><label>6</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Canaider</surname><given-names>S</given-names></name><name><surname>Lenzi</surname><given-names>L</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name><name><surname>D'Addabbo</surname><given-names>P</given-names></name><name><surname>Giannone</surname><given-names>S</given-names></name><name><surname>Carinci</surname><given-names>P</given-names></name><name><surname>Zannotti</surname><given-names>M</given-names></name></person-group><article-title>Cysteine and tyrosine-rich 1 (CYYR1), a novel unpredicted gene on human chromosome 21 (21q21.2), encodes a cysteine and tyrosine-rich protein and defines a new family of highly conserved vertebrate-specific genes</article-title><source>Gene</source><volume>290</volume><fpage>141</fpage><lpage>151</lpage><year>2002</year><pub-id pub-id-type="doi">10.1016/S0378-1119(02)00550-4</pub-id><pub-id pub-id-type="pmid">12062809</pub-id></element-citation></ref>
<ref id="b7-ijmm-39-05-1063"><label>7</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>J</given-names></name><name><surname>Lou</surname><given-names>X</given-names></name><name><surname>Shen</surname><given-names>H</given-names></name><name><surname>Zellmer</surname><given-names>L</given-names></name><name><surname>Sun</surname><given-names>Y</given-names></name><name><surname>Liu</surname><given-names>S</given-names></name><name><surname>Xu</surname><given-names>N</given-names></name><name><surname>Liao</surname><given-names>DJ</given-names></name></person-group><article-title>Isoforms of wild type proteins often appear as low molecular weight bands on SDS-PAGE</article-title><source>Biotechnol J</source><volume>9</volume><fpage>1044</fpage><lpage>1054</lpage><year>2014</year><pub-id pub-id-type="doi">10.1002/biot.201400072</pub-id><pub-id pub-id-type="pmid">24906056</pub-id></element-citation></ref>
<ref id="b8-ijmm-39-05-1063"><label>8</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Adams</surname><given-names>MD</given-names></name><name><surname>Kelley</surname><given-names>JM</given-names></name><name><surname>Gocayne</surname><given-names>JD</given-names></name><name><surname>Dubnick</surname><given-names>M</given-names></name><name><surname>Polymeropoulos</surname><given-names>MH</given-names></name><name><surname>Xiao</surname><given-names>H</given-names></name><name><surname>Merril</surname><given-names>CR</given-names></name><name><surname>Wu</surname><given-names>A</given-names></name><name><surname>Olde</surname><given-names>B</given-names></name><name><surname>Moreno</surname><given-names>RF</given-names></name><etal/></person-group><article-title>Complementary DNA sequencing: Expressed sequence tags and human genome project</article-title><source>Science</source><volume>252</volume><fpage>1651</fpage><lpage>1656</lpage><year>1991</year><pub-id pub-id-type="doi">10.1126/science.2047873</pub-id><pub-id pub-id-type="pmid">2047873</pub-id></element-citation></ref>
<ref id="b9-ijmm-39-05-1063"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Boguski</surname><given-names>MS</given-names></name><name><surname>Lowe</surname><given-names>TM</given-names></name><name><surname>Tolstoshev</surname><given-names>CM</given-names></name></person-group><article-title>dbEST - database for 'expressed sequence tags'</article-title><source>Nat Genet</source><volume>4</volume><fpage>332</fpage><lpage>333</lpage><year>1993</year><pub-id pub-id-type="doi">10.1038/ng0893-332</pub-id><pub-id pub-id-type="pmid">8401577</pub-id></element-citation></ref>
<ref id="b10-ijmm-39-05-1063"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nagaraj</surname><given-names>SH</given-names></name><name><surname>Gasser</surname><given-names>RB</given-names></name><name><surname>Ranganathan</surname><given-names>S</given-names></name></person-group><article-title>A hitchhiker's guide to expressed sequence tag (EST) analysis</article-title><source>Brief Bioinform</source><volume>8</volume><fpage>6</fpage><lpage>21</lpage><year>2007</year><pub-id pub-id-type="doi">10.1093/bib/bbl015</pub-id></element-citation></ref>
<ref id="b11-ijmm-39-05-1063"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Parkinson</surname><given-names>J</given-names></name><name><surname>Blaxter</surname><given-names>M</given-names></name></person-group><article-title>Expressed sequence tags: An overview</article-title><source>Methods Mol Biol</source><volume>533</volume><fpage>1</fpage><lpage>12</lpage><year>2009</year><pub-id pub-id-type="doi">10.1007/978-1-60327-136-3_1</pub-id><pub-id pub-id-type="pmid">19277571</pub-id></element-citation></ref>
<ref id="b12-ijmm-39-05-1063"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gill</surname><given-names>RW</given-names></name><name><surname>Sanseau</surname><given-names>P</given-names></name></person-group><article-title>Rapid in silico cloning of genes using expressed sequence tags (ESTs)</article-title><source>Biotechnol Annu Rev</source><volume>5</volume><fpage>25</fpage><lpage>44</lpage><year>2000</year><pub-id pub-id-type="doi">10.1016/S1387-2656(00)05031-6</pub-id><pub-id pub-id-type="pmid">10874996</pub-id></element-citation></ref>
<ref id="b13-ijmm-39-05-1063"><label>13</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Carulli</surname><given-names>JP</given-names></name><name><surname>Artinger</surname><given-names>M</given-names></name><name><surname>Swain</surname><given-names>PM</given-names></name><name><surname>Root</surname><given-names>CD</given-names></name><name><surname>Chee</surname><given-names>L</given-names></name><name><surname>Tulig</surname><given-names>C</given-names></name><name><surname>Guerin</surname><given-names>J</given-names></name><name><surname>Osborne</surname><given-names>M</given-names></name><name><surname>Stein</surname><given-names>G</given-names></name><name><surname>Lian</surname><given-names>J</given-names></name><etal/></person-group><article-title>High throughput analysis of differential gene expression</article-title><source>J Cell Biochem Suppl</source><volume>30&#x02013;31</volume><fpage>286</fpage><lpage>296</lpage><year>1998</year><pub-id pub-id-type="doi">10.1002/(SICI)1097-4644(1998)72:30/31+&lt;286::AID-JCB35&gt;3.0.CO;2-D</pub-id></element-citation></ref>
<ref id="b14-ijmm-39-05-1063"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sorek</surname><given-names>R</given-names></name><name><surname>Shamir</surname><given-names>R</given-names></name><name><surname>Ast</surname><given-names>G</given-names></name></person-group><article-title>How prevalent is functional alternative splicing in the human genome?</article-title><source>Trends Genet</source><volume>20</volume><fpage>68</fpage><lpage>71</lpage><year>2004</year><pub-id pub-id-type="doi">10.1016/j.tig.2003.12.004</pub-id><pub-id pub-id-type="pmid">14746986</pub-id></element-citation></ref>
<ref id="b15-ijmm-39-05-1063"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bonizzoni</surname><given-names>P</given-names></name><name><surname>Rizzi</surname><given-names>R</given-names></name><name><surname>Pesole</surname><given-names>G</given-names></name></person-group><article-title>Computational methods for alternative splicing prediction</article-title><source>Brief Funct Genomics Proteomics</source><volume>5</volume><fpage>46</fpage><lpage>51</lpage><year>2006</year><pub-id pub-id-type="doi">10.1093/bfgp/ell011</pub-id></element-citation></ref>
<ref id="b16-ijmm-39-05-1063"><label>16</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brent</surname><given-names>MR</given-names></name></person-group><article-title>Genome annotation past, present, and future: How to define an ORF at each locus</article-title><source>Genome Res</source><volume>15</volume><fpage>1777</fpage><lpage>1786</lpage><year>2005</year><pub-id pub-id-type="doi">10.1101/gr.3866105</pub-id><pub-id pub-id-type="pmid">16339376</pub-id></element-citation></ref>
<ref id="b17-ijmm-39-05-1063"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sanger</surname><given-names>F</given-names></name></person-group><article-title>La structure de l'insuline</article-title><source>Bull Soc Chim Biol (Paris)</source><volume>37</volume><fpage>23</fpage><lpage>35</lpage><year>1955</year><comment>In French</comment></element-citation></ref>
<ref id="b18-ijmm-39-05-1063"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yanofsky</surname><given-names>C</given-names></name><name><surname>Carlton</surname><given-names>BC</given-names></name><name><surname>Guest</surname><given-names>JR</given-names></name><name><surname>Helinski</surname><given-names>DR</given-names></name><name><surname>Henning</surname><given-names>U</given-names></name></person-group><article-title>On the colinearity of gene structure and protein structure</article-title><source>Proc Natl Acad Sci USA</source><volume>51</volume><fpage>266</fpage><lpage>272</lpage><year>1964</year><pub-id pub-id-type="doi">10.1073/pnas.51.2.266</pub-id><pub-id pub-id-type="pmid">14124325</pub-id><pub-id pub-id-type="pmcid">300060</pub-id></element-citation></ref>
<ref id="b19-ijmm-39-05-1063"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sanger</surname><given-names>F</given-names></name><name><surname>Nicklen</surname><given-names>S</given-names></name><name><surname>Coulson</surname><given-names>AR</given-names></name></person-group><article-title>DNA sequencing with chain-terminating inhibitors</article-title><source>Proc Natl Acad Sci USA</source><volume>74</volume><fpage>5463</fpage><lpage>5467</lpage><year>1977</year><pub-id pub-id-type="doi">10.1073/pnas.74.12.5463</pub-id><pub-id pub-id-type="pmid">271968</pub-id><pub-id pub-id-type="pmcid">431765</pub-id></element-citation></ref>
<ref id="b20-ijmm-39-05-1063"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ruddle</surname><given-names>FH</given-names></name></person-group><article-title>The William Allan Memorial Award address: Reverse genetics and beyond</article-title><source>Am J Hum Genet</source><volume>36</volume><fpage>944</fpage><lpage>953</lpage><year>1984</year><pub-id pub-id-type="pmid">6594045</pub-id><pub-id pub-id-type="pmcid">1684509</pub-id></element-citation></ref>
<ref id="b21-ijmm-39-05-1063"><label>21</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kozak</surname><given-names>M</given-names></name></person-group><article-title>Pushing the limits of the scanning mechanism for initiation of translation</article-title><source>Gene</source><volume>299</volume><fpage>1</fpage><lpage>34</lpage><year>2002</year><pub-id pub-id-type="doi">10.1016/S0378-1119(02)01056-9</pub-id><pub-id pub-id-type="pmid">12459250</pub-id></element-citation></ref>
<ref id="b22-ijmm-39-05-1063"><label>22</label><element-citation publication-type="book"><person-group person-group-type="editor"><name><surname>Sambrook</surname><given-names>J</given-names></name><name><surname>Russel</surname><given-names>DW</given-names></name></person-group><article-title>Rapid amplification of 5&#x02032; cDNA ends</article-title><source>Molecular Cloning: A Laboratory Manual</source><volume>3</volume><edition>3rd edition</edition><publisher-name>Cold Spring Harbor Laboratory Press, Cold Spring Harbor</publisher-name><publisher-loc>NY</publisher-loc><fpage>8.54</fpage><lpage>8.60</lpage><year>2001</year></element-citation></ref>
<ref id="b23-ijmm-39-05-1063"><label>23</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Okayama</surname><given-names>H</given-names></name><name><surname>Berg</surname><given-names>P</given-names></name></person-group><article-title>High-efficiency cloning of full-length cDNA</article-title><source>Mol Cell Biol</source><volume>2</volume><fpage>161</fpage><lpage>170</lpage><year>1982</year><pub-id pub-id-type="doi">10.1128/MCB.2.2.161</pub-id><pub-id pub-id-type="pmid">6287227</pub-id><pub-id pub-id-type="pmcid">369769</pub-id></element-citation></ref>
<ref id="b24-ijmm-39-05-1063"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Baralle</surname><given-names>F</given-names></name></person-group><article-title>Complete nucleotide sequence of the 5&#x02032; noncoding region of human alpha-and beta-globin mRNA</article-title><source>Cell</source><volume>12</volume><fpage>1085</fpage><lpage>1095</lpage><year>1977</year><pub-id pub-id-type="doi">10.1016/0092-8674(77)90171-4</pub-id><pub-id pub-id-type="pmid">597858</pub-id></element-citation></ref>
<ref id="b25-ijmm-39-05-1063"><label>25</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Proudfoot</surname><given-names>NJ</given-names></name></person-group><article-title>Complete 3&#x02032; noncoding region sequences of rabbit and human beta-globin messenger RNAs</article-title><source>Cell</source><volume>10</volume><fpage>559</fpage><lpage>570</lpage><year>1977</year><pub-id pub-id-type="doi">10.1016/0092-8674(77)90089-7</pub-id><pub-id pub-id-type="pmid">67897</pub-id></element-citation></ref>
<ref id="b26-ijmm-39-05-1063"><label>26</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Marotta</surname><given-names>CA</given-names></name><name><surname>Wilson</surname><given-names>JT</given-names></name><name><surname>Forget</surname><given-names>BG</given-names></name><name><surname>Weissman</surname><given-names>SM</given-names></name></person-group><article-title>Human beta-globin messenger RNA. III Nucleotide sequences derived from complementary DNA</article-title><source>J Biol Chem</source><volume>252</volume><fpage>5040</fpage><lpage>5053</lpage><year>1977</year><pub-id pub-id-type="pmid">68958</pub-id></element-citation></ref>
<ref id="b27-ijmm-39-05-1063"><label>27</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Efstratiadis</surname><given-names>A</given-names></name><name><surname>Kafatos</surname><given-names>FC</given-names></name><name><surname>Maniatis</surname><given-names>T</given-names></name></person-group><article-title>The primary structure of rabbit beta-globin mRNA as determined from cloned DNA</article-title><source>Cell</source><volume>10</volume><fpage>571</fpage><lpage>585</lpage><year>1977</year><pub-id pub-id-type="doi">10.1016/0092-8674(77)90090-3</pub-id><pub-id pub-id-type="pmid">558827</pub-id></element-citation></ref>
<ref id="b28-ijmm-39-05-1063"><label>28</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ullrich</surname><given-names>A</given-names></name><name><surname>Shine</surname><given-names>J</given-names></name><name><surname>Chirgwin</surname><given-names>J</given-names></name><name><surname>Pictet</surname><given-names>R</given-names></name><name><surname>Tischer</surname><given-names>E</given-names></name><name><surname>Rutter</surname><given-names>WJ</given-names></name><name><surname>Goodman</surname><given-names>HM</given-names></name></person-group><article-title>Rat insulin genes: Construction of plasmids containing the coding sequences</article-title><source>Science</source><volume>196</volume><fpage>1313</fpage><lpage>1319</lpage><year>1977</year><pub-id pub-id-type="doi">10.1126/science.325648</pub-id><pub-id pub-id-type="pmid">325648</pub-id></element-citation></ref>
<ref id="b29-ijmm-39-05-1063"><label>29</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name><name><surname>D'Addabbo</surname><given-names>P</given-names></name><name><surname>Canaider</surname><given-names>S</given-names></name><name><surname>Lenzi</surname><given-names>L</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Giannone</surname><given-names>S</given-names></name><name><surname>Frabetti</surname><given-names>F</given-names></name><name><surname>Facchin</surname><given-names>F</given-names></name><name><surname>Carinci</surname><given-names>P</given-names></name><etal/></person-group><article-title>mRNA 5&#x02032; region sequence incompleteness: A potential source of systematic errors in translation initiation codon assignment in human mRNAs</article-title><source>Gene</source><volume>321</volume><fpage>185</fpage><lpage>193</lpage><year>2003</year><pub-id pub-id-type="doi">10.1016/S0378-1119(03)00835-7</pub-id><pub-id pub-id-type="pmid">14637006</pub-id></element-citation></ref>
<ref id="b30-ijmm-39-05-1063"><label>30</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Harbers</surname><given-names>M</given-names></name></person-group><article-title>The current status of cDNA cloning</article-title><source>Genomics</source><volume>91</volume><fpage>232</fpage><lpage>242</lpage><year>2008</year><pub-id pub-id-type="doi">10.1016/j.ygeno.2007.11.004</pub-id><pub-id pub-id-type="pmid">18222633</pub-id></element-citation></ref>
<ref id="b31-ijmm-39-05-1063"><label>31</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Kvam</surname><given-names>C</given-names></name><name><surname>Kitamura</surname><given-names>A</given-names></name><name><surname>Ohsumi</surname><given-names>T</given-names></name><name><surname>Okazaki</surname><given-names>Y</given-names></name><name><surname>Itoh</surname><given-names>M</given-names></name><name><surname>Kamiya</surname><given-names>M</given-names></name><name><surname>Shibata</surname><given-names>K</given-names></name><name><surname>Sasaki</surname><given-names>N</given-names></name><name><surname>Izawa</surname><given-names>M</given-names></name><etal/></person-group><article-title>High-efficiency full-length cDNA cloning by biotinylated CAP trapper</article-title><source>Genomics</source><volume>37</volume><fpage>327</fpage><lpage>336</lpage><year>1996</year><pub-id pub-id-type="doi">10.1006/geno.1996.0567</pub-id><pub-id pub-id-type="pmid">8938445</pub-id></element-citation></ref>
<ref id="b32-ijmm-39-05-1063"><label>32</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kodzius</surname><given-names>R</given-names></name><name><surname>Kojima</surname><given-names>M</given-names></name><name><surname>Nishiyori</surname><given-names>H</given-names></name><name><surname>Nakamura</surname><given-names>M</given-names></name><name><surname>Fukuda</surname><given-names>S</given-names></name><name><surname>Tagami</surname><given-names>M</given-names></name><name><surname>Sasaki</surname><given-names>D</given-names></name><name><surname>Imamura</surname><given-names>K</given-names></name><name><surname>Kai</surname><given-names>C</given-names></name><name><surname>Harbers</surname><given-names>M</given-names></name><etal/></person-group><article-title>CAGE: Cap analysis of gene expression</article-title><source>Nat Methods</source><volume>3</volume><fpage>211</fpage><lpage>222</lpage><year>2006</year><pub-id pub-id-type="doi">10.1038/nmeth0306-211</pub-id><pub-id pub-id-type="pmid">16489339</pub-id></element-citation></ref>
<ref id="b33-ijmm-39-05-1063"><label>33</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Frohman</surname><given-names>MA</given-names></name><name><surname>Dush</surname><given-names>MK</given-names></name><name><surname>Martin</surname><given-names>GR</given-names></name></person-group><article-title>Rapid production of full-length cDNAs from rare transcripts: Amplification using a single gene-specific oligonucleotide primer</article-title><source>Proc Natl Acad Sci USA</source><volume>85</volume><fpage>8998</fpage><lpage>9002</lpage><year>1988</year><pub-id pub-id-type="doi">10.1073/pnas.85.23.8998</pub-id><pub-id pub-id-type="pmid">2461560</pub-id><pub-id pub-id-type="pmcid">282649</pub-id></element-citation></ref>
<ref id="b34-ijmm-39-05-1063"><label>34</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Denoeud</surname><given-names>F</given-names></name><name><surname>Kapranov</surname><given-names>P</given-names></name><name><surname>Ucla</surname><given-names>C</given-names></name><name><surname>Frankish</surname><given-names>A</given-names></name><name><surname>Castelo</surname><given-names>R</given-names></name><name><surname>Drenkow</surname><given-names>J</given-names></name><name><surname>Lagarde</surname><given-names>J</given-names></name><name><surname>Alioto</surname><given-names>T</given-names></name><name><surname>Manzano</surname><given-names>C</given-names></name><name><surname>Chrast</surname><given-names>J</given-names></name><etal/></person-group><article-title>Prominent use of distal 5&#x02032; transcription start sites and discovery of a large number of additional exons in ENCODE regions</article-title><source>Genome Res</source><volume>17</volume><fpage>746</fpage><lpage>759</lpage><year>2007</year><pub-id pub-id-type="doi">10.1101/gr.5660607</pub-id><pub-id pub-id-type="pmid">17567994</pub-id><pub-id pub-id-type="pmcid">1891335</pub-id></element-citation></ref>
<ref id="b35-ijmm-39-05-1063"><label>35</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Suzuki</surname><given-names>Y</given-names></name><name><surname>Ishihara</surname><given-names>D</given-names></name><name><surname>Sasaki</surname><given-names>M</given-names></name><name><surname>Nakagawa</surname><given-names>H</given-names></name><name><surname>Hata</surname><given-names>H</given-names></name><name><surname>Tsunoda</surname><given-names>T</given-names></name><name><surname>Watanabe</surname><given-names>M</given-names></name><name><surname>Komatsu</surname><given-names>T</given-names></name><name><surname>Ota</surname><given-names>T</given-names></name><name><surname>Isogai</surname><given-names>T</given-names></name><etal/></person-group><article-title>Statistical analysis of the 5&#x02032; untranslated region of human mRNA using 'Oligo-Capped' cDNA libraries</article-title><source>Genomics</source><volume>64</volume><fpage>286</fpage><lpage>297</lpage><year>2000</year><pub-id pub-id-type="doi">10.1006/geno.2000.6076</pub-id><pub-id pub-id-type="pmid">10756096</pub-id></element-citation></ref>
<ref id="b36-ijmm-39-05-1063"><label>36</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Porcel</surname><given-names>BM</given-names></name><name><surname>Delfour</surname><given-names>O</given-names></name><name><surname>Castelli</surname><given-names>V</given-names></name><name><surname>De Berardinis</surname><given-names>V</given-names></name><name><surname>Friedlander</surname><given-names>L</given-names></name><name><surname>Cruaud</surname><given-names>C</given-names></name><name><surname>Ureta-Vidal</surname><given-names>A</given-names></name><name><surname>Scarpelli</surname><given-names>C</given-names></name><name><surname>Wincker</surname><given-names>P</given-names></name><name><surname>Sch&#x000E4;chter</surname><given-names>V</given-names></name><etal/></person-group><article-title>Numerous novel annotations of the human genome sequence supported by a 5&#x02032;-end-enriched cDNA collection</article-title><source>Genome Res</source><volume>14</volume><fpage>463</fpage><lpage>471</lpage><year>2004</year><pub-id pub-id-type="doi">10.1101/gr.1481104</pub-id><pub-id pub-id-type="pmid">14962985</pub-id><pub-id pub-id-type="pmcid">353234</pub-id></element-citation></ref>
<ref id="b37-ijmm-39-05-1063"><label>37</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Metzker</surname><given-names>ML</given-names></name></person-group><article-title>Sequencing technologies - the next generation</article-title><source>Nat Rev Genet</source><volume>11</volume><fpage>31</fpage><lpage>46</lpage><year>2010</year><pub-id pub-id-type="doi">10.1038/nrg2626</pub-id></element-citation></ref>
<ref id="b38-ijmm-39-05-1063"><label>38</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Ghaemmaghami</surname><given-names>S</given-names></name><name><surname>Newman</surname><given-names>JR</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><article-title>Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling</article-title><source>Science</source><volume>324</volume><fpage>218</fpage><lpage>223</lpage><year>2009</year><pub-id pub-id-type="doi">10.1126/science.1168978</pub-id><pub-id pub-id-type="pmid">19213877</pub-id><pub-id pub-id-type="pmcid">2746483</pub-id></element-citation></ref>
<ref id="b39-ijmm-39-05-1063"><label>39</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Lareau</surname><given-names>LF</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><article-title>Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes</article-title><source>Cell</source><volume>147</volume><fpage>789</fpage><lpage>802</lpage><year>2011</year><pub-id pub-id-type="doi">10.1016/j.cell.2011.10.002</pub-id><pub-id pub-id-type="pmid">22056041</pub-id><pub-id pub-id-type="pmcid">3225288</pub-id></element-citation></ref>
<ref id="b40-ijmm-39-05-1063"><label>40</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fritsch</surname><given-names>C</given-names></name><name><surname>Herrmann</surname><given-names>A</given-names></name><name><surname>Nothnagel</surname><given-names>M</given-names></name><name><surname>Szafranski</surname><given-names>K</given-names></name><name><surname>Huse</surname><given-names>K</given-names></name><name><surname>Schumann</surname><given-names>F</given-names></name><name><surname>Schreiber</surname><given-names>S</given-names></name><name><surname>Platzer</surname><given-names>M</given-names></name><name><surname>Krawczak</surname><given-names>M</given-names></name><name><surname>Hampe</surname><given-names>J</given-names></name><etal/></person-group><article-title>Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting</article-title><source>Genome Res</source><volume>22</volume><fpage>2208</fpage><lpage>2218</lpage><year>2012</year><pub-id pub-id-type="doi">10.1101/gr.139568.112</pub-id><pub-id pub-id-type="pmid">22879431</pub-id><pub-id pub-id-type="pmcid">3483550</pub-id></element-citation></ref>
<ref id="b41-ijmm-39-05-1063"><label>41</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Van Damme</surname><given-names>P</given-names></name><name><surname>Gawron</surname><given-names>D</given-names></name><name><surname>Van Criekinge</surname><given-names>W</given-names></name><name><surname>Menschaert</surname><given-names>G</given-names></name></person-group><article-title>N-terminal proteomics and ribosome profiling provide a comprehensive view of the alternative translation initiation landscape in mice and men</article-title><source>Mol Cell Proteomics</source><volume>13</volume><fpage>1245</fpage><lpage>1261</lpage><year>2014</year><pub-id pub-id-type="doi">10.1074/mcp.M113.036442</pub-id><pub-id pub-id-type="pmid">24623590</pub-id><pub-id pub-id-type="pmcid">4014282</pub-id></element-citation></ref>
<ref id="b42-ijmm-39-05-1063"><label>42</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Iacono</surname><given-names>M</given-names></name><name><surname>Mignone</surname><given-names>F</given-names></name><name><surname>Pesole</surname><given-names>G</given-names></name></person-group><article-title>uAUG and uORFs in human and rodent 5&#x02032; untranslated mRNAs</article-title><source>Gene</source><volume>349</volume><fpage>97</fpage><lpage>105</lpage><year>2005</year><pub-id pub-id-type="doi">10.1016/j.gene.2004.11.041</pub-id><pub-id pub-id-type="pmid">15777708</pub-id></element-citation></ref>
<ref id="b43-ijmm-39-05-1063"><label>43</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Barbosa</surname><given-names>C</given-names></name><name><surname>Peixeiro</surname><given-names>I</given-names></name><name><surname>Rom&#x000E3;o</surname><given-names>L</given-names></name></person-group><article-title>Gene expression regulation by upstream open reading frames and human disease</article-title><source>PLoS Genet</source><volume>9</volume><fpage>e1003529</fpage><year>2013</year><pub-id pub-id-type="doi">10.1371/journal.pgen.1003529</pub-id><pub-id pub-id-type="pmid">23950723</pub-id><pub-id pub-id-type="pmcid">3738444</pub-id></element-citation></ref>
<ref id="b44-ijmm-39-05-1063"><label>44</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nishitani</surname><given-names>H</given-names></name><name><surname>Hirose</surname><given-names>E</given-names></name><name><surname>Uchimura</surname><given-names>Y</given-names></name><name><surname>Nakamura</surname><given-names>M</given-names></name><name><surname>Umeda</surname><given-names>M</given-names></name><name><surname>Nishii</surname><given-names>K</given-names></name><name><surname>Mori</surname><given-names>N</given-names></name><name><surname>Nishimoto</surname><given-names>T</given-names></name></person-group><article-title>Full-sized RanBPM cDNA encodes a protein possessing a long stretch of proline and glutamine within the N-terminal region, comprising a large protein complex</article-title><source>Gene</source><volume>272</volume><fpage>25</fpage><lpage>33</lpage><year>2001</year><pub-id pub-id-type="doi">10.1016/S0378-1119(01)00553-4</pub-id><pub-id pub-id-type="pmid">11470507</pub-id></element-citation></ref>
<ref id="b45-ijmm-39-05-1063"><label>45</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kobayashi</surname><given-names>A</given-names></name><name><surname>Ito</surname><given-names>E</given-names></name><name><surname>Toki</surname><given-names>T</given-names></name><name><surname>Kogame</surname><given-names>K</given-names></name><name><surname>Takahashi</surname><given-names>S</given-names></name><name><surname>Igarashi</surname><given-names>K</given-names></name><name><surname>Hayashi</surname><given-names>N</given-names></name><name><surname>Yamamoto</surname><given-names>M</given-names></name></person-group><article-title>Molecular cloning and functional characterization of a new Cap'n' collar family transcription factor Nrf3</article-title><source>J Biol Chem</source><volume>274</volume><fpage>6443</fpage><lpage>6452</lpage><year>1999</year><pub-id pub-id-type="doi">10.1074/jbc.274.10.6443</pub-id><pub-id pub-id-type="pmid">10037736</pub-id></element-citation></ref>
<ref id="b46-ijmm-39-05-1063"><label>46</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nomura</surname><given-names>N</given-names></name><name><surname>Nagase</surname><given-names>T</given-names></name><name><surname>Miyajima</surname><given-names>N</given-names></name><name><surname>Sazuka</surname><given-names>T</given-names></name><name><surname>Tanaka</surname><given-names>A</given-names></name><name><surname>Sato</surname><given-names>S</given-names></name><name><surname>Seki</surname><given-names>N</given-names></name><name><surname>Kawarabayasi</surname><given-names>Y</given-names></name><name><surname>Ishikawa</surname><given-names>K</given-names></name><name><surname>Tabata</surname><given-names>S</given-names></name></person-group><article-title>Prediction of the coding sequences of unidentified human genes. II The coding sequences of 40 new genes (KIAA0041-KIAA0080) deduced by analysis of cDNA clones from human cell line KG-1</article-title><source>DNA Res</source><volume>1</volume><fpage>223</fpage><lpage>229</lpage><year>1994</year><pub-id pub-id-type="doi">10.1093/dnares/1.5.223</pub-id></element-citation></ref>
<ref id="b47-ijmm-39-05-1063"><label>47</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kingsley</surname><given-names>C</given-names></name><name><surname>Winoto</surname><given-names>A</given-names></name></person-group><article-title>Cloning of GT box-binding proteins: A novel Sp1 multigene family regulating T-cell receptor gene expression</article-title><source>Mol Cell Biol</source><volume>12</volume><fpage>4251</fpage><lpage>4261</lpage><year>1992</year><pub-id pub-id-type="doi">10.1128/MCB.12.10.4251</pub-id><pub-id pub-id-type="pmid">1341900</pub-id><pub-id pub-id-type="pmcid">360348</pub-id></element-citation></ref>
<ref id="b48-ijmm-39-05-1063"><label>48</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Strippoli</surname><given-names>P</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Caracausi</surname><given-names>M</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Locatelli</surname><given-names>C</given-names></name><name><surname>Mimmi</surname><given-names>MC</given-names></name><name><surname>Berardi</surname><given-names>AC</given-names></name><name><surname>Ricotta</surname><given-names>D</given-names></name><name><surname>Radeghieri</surname><given-names>A</given-names></name><etal/></person-group><article-title>An integrated route to identifying new pathogenesis-based therapeutic approaches for trisomy 21 (Down Syndrome) following the thought of J&#x000E9;r&#x000F4;me Lejeune</article-title><source>Sci Postprint</source><volume>1</volume><fpage>e00010</fpage><year>2013</year><pub-id pub-id-type="doi">10.14340/spp.2013.12R0005</pub-id></element-citation></ref>
<ref id="b49-ijmm-39-05-1063"><label>49</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Cicchini</surname><given-names>E</given-names></name><name><surname>Locatelli</surname><given-names>C</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Caracausi</surname><given-names>M</given-names></name><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Rocca</surname><given-names>A</given-names></name><name><surname>Poletti</surname><given-names>G</given-names></name><name><surname>Seri</surname><given-names>M</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name><etal/></person-group><article-title>Systematic reanalysis of partial trisomy 21 cases with or without Down syndrome suggests a small region on 21q22.13 as critical to the phenotype</article-title><source>Hum Mol Genet</source><volume>25</volume><fpage>2525</fpage><lpage>2538</lpage><year>2016</year><pub-id pub-id-type="pmid">27106104</pub-id><pub-id pub-id-type="pmcid">5181629</pub-id></element-citation></ref>
<ref id="b50-ijmm-39-05-1063"><label>50</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hattori</surname><given-names>M</given-names></name><name><surname>Fujiyama</surname><given-names>A</given-names></name><name><surname>Taylor</surname><given-names>TD</given-names></name><name><surname>Watanabe</surname><given-names>H</given-names></name><name><surname>Yada</surname><given-names>T</given-names></name><name><surname>Park</surname><given-names>HS</given-names></name><name><surname>Toyoda</surname><given-names>A</given-names></name><name><surname>Ishii</surname><given-names>K</given-names></name><name><surname>Totoki</surname><given-names>Y</given-names></name><name><surname>Choi</surname><given-names>DK</given-names></name><etal/><collab>Chromosome 21 mapping and sequencing consortium</collab></person-group><article-title>The DNA sequence of human chromosome 21</article-title><source>Nature</source><volume>405</volume><fpage>311</fpage><lpage>319</lpage><year>2000</year><pub-id pub-id-type="doi">10.1038/35012518</pub-id><pub-id pub-id-type="pmid">10830953</pub-id></element-citation></ref>
<ref id="b51-ijmm-39-05-1063"><label>51</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Reymond</surname><given-names>A</given-names></name><name><surname>Camargo</surname><given-names>AA</given-names></name><name><surname>Deutsch</surname><given-names>S</given-names></name><name><surname>Stevenson</surname><given-names>BJ</given-names></name><name><surname>Parmigiani</surname><given-names>RB</given-names></name><name><surname>Ucla</surname><given-names>C</given-names></name><name><surname>Bettoni</surname><given-names>F</given-names></name><name><surname>Rossier</surname><given-names>C</given-names></name><name><surname>Lyle</surname><given-names>R</given-names></name><name><surname>Guipponi</surname><given-names>M</given-names></name><etal/></person-group><article-title>Nineteen additional unpredicted transcripts from human chromosome 21</article-title><source>Genomics</source><volume>79</volume><fpage>824</fpage><lpage>832</lpage><year>2002</year><pub-id pub-id-type="doi">10.1006/geno.2002.6781</pub-id><pub-id pub-id-type="pmid">12036297</pub-id></element-citation></ref>
<ref id="b52-ijmm-39-05-1063"><label>52</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pesole</surname><given-names>G</given-names></name><name><surname>Gissi</surname><given-names>C</given-names></name><name><surname>Grillo</surname><given-names>G</given-names></name><name><surname>Licciulli</surname><given-names>F</given-names></name><name><surname>Liuni</surname><given-names>S</given-names></name><name><surname>Saccone</surname><given-names>C</given-names></name></person-group><article-title>Analysis of oligonucleotide AUG start codon context in eukariotic mRNAs</article-title><source>Gene</source><volume>261</volume><fpage>85</fpage><lpage>91</lpage><year>2000</year><pub-id pub-id-type="doi">10.1016/S0378-1119(00)00471-6</pub-id></element-citation></ref>
<ref id="b53-ijmm-39-05-1063"><label>53</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Frabetti</surname><given-names>F</given-names></name><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Lenzi</surname><given-names>L</given-names></name><name><surname>Canaider</surname><given-names>S</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Facchin</surname><given-names>F</given-names></name><name><surname>Carinci</surname><given-names>P</given-names></name><name><surname>Zannotti</surname><given-names>M</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name></person-group><article-title>Systematic analysis of mRNA 5&#x02032; coding sequence incompleteness in Danio rerio: An automated EST-based approach</article-title><source>Biol Direct</source><volume>2</volume><fpage>34</fpage><year>2007</year><pub-id pub-id-type="doi">10.1186/1745-6150-2-34</pub-id></element-citation></ref>
<ref id="b54-ijmm-39-05-1063"><label>54</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Facchin</surname><given-names>F</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Canaider</surname><given-names>S</given-names></name><name><surname>Bianconi</surname><given-names>E</given-names></name><name><surname>Frabetti</surname><given-names>F</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name></person-group><article-title>Genome-scale analysis of human mRNA 5&#x02032; coding sequences based on expressed sequence tag (EST) database</article-title><source>Genomics</source><volume>100</volume><fpage>125</fpage><lpage>130</lpage><year>2012</year><pub-id pub-id-type="doi">10.1016/j.ygeno.2012.05.012</pub-id><pub-id pub-id-type="pmid">22659028</pub-id></element-citation></ref>
<ref id="b55-ijmm-39-05-1063"><label>55</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Caracausi</surname><given-names>M</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Martini</surname><given-names>S</given-names></name><name><surname>Bassani</surname><given-names>C</given-names></name><name><surname>Gurioli</surname><given-names>A</given-names></name><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Sold&#x000E0;</surname><given-names>G</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name></person-group><article-title>Improving mRNA 5&#x02032; coding sequence determination in the mouse genome</article-title><source>Mamm Genome</source><volume>25</volume><fpage>149</fpage><lpage>159</lpage><year>2014</year><pub-id pub-id-type="doi">10.1007/s00335-013-9498-3</pub-id><pub-id pub-id-type="pmid">24504701</pub-id></element-citation></ref>
<ref id="b56-ijmm-39-05-1063"><label>56</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kochetov</surname><given-names>AV</given-names></name><name><surname>Sarai</surname><given-names>A</given-names></name><name><surname>Rogozin</surname><given-names>IB</given-names></name><name><surname>Shumny</surname><given-names>VK</given-names></name><name><surname>Kolchanov</surname><given-names>NA</given-names></name></person-group><article-title>The role of alternative translation start sites in the generation of human protein diversity</article-title><source>Mol Genet Genomics</source><volume>273</volume><fpage>491</fpage><lpage>496</lpage><year>2005</year><pub-id pub-id-type="doi">10.1007/s00438-005-1152-7</pub-id><pub-id pub-id-type="pmid">15959805</pub-id></element-citation></ref>
<ref id="b57-ijmm-39-05-1063"><label>57</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bazykin</surname><given-names>GA</given-names></name><name><surname>Kochetov</surname><given-names>AV</given-names></name></person-group><article-title>Alternative translation start sites are conserved in eukaryotic genomes</article-title><source>Nucleic Acids Res</source><volume>39</volume><fpage>567</fpage><lpage>577</lpage><year>2011</year><pub-id pub-id-type="doi">10.1093/nar/gkq806</pub-id><pub-id pub-id-type="pmcid">3025576</pub-id></element-citation></ref>
<ref id="b58-ijmm-39-05-1063"><label>58</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ivanov</surname><given-names>IP</given-names></name><name><surname>Firth</surname><given-names>AE</given-names></name><name><surname>Michel</surname><given-names>AM</given-names></name><name><surname>Atkins</surname><given-names>JF</given-names></name><name><surname>Baranov</surname><given-names>PV</given-names></name></person-group><article-title>Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences</article-title><source>Nucleic Acids Res</source><volume>39</volume><fpage>4220</fpage><lpage>4234</lpage><year>2011</year><pub-id pub-id-type="doi">10.1093/nar/gkr007</pub-id><pub-id pub-id-type="pmid">21266472</pub-id><pub-id pub-id-type="pmcid">3105428</pub-id></element-citation></ref>
<ref id="b59-ijmm-39-05-1063"><label>59</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Arakaki</surname><given-names>TL</given-names></name><name><surname>Pezza</surname><given-names>JA</given-names></name><name><surname>Cronin</surname><given-names>MA</given-names></name><name><surname>Hopkins</surname><given-names>CE</given-names></name><name><surname>Zimmer</surname><given-names>DB</given-names></name><name><surname>Tolan</surname><given-names>DR</given-names></name><name><surname>Allen</surname><given-names>KN</given-names></name></person-group><article-title>Structure of human brain fructose 1,6-(bis)phosphate aldolase: Linking isozyme structure with function</article-title><source>Protein Sci</source><volume>13</volume><fpage>3077</fpage><lpage>3084</lpage><year>2004</year><pub-id pub-id-type="doi">10.1110/ps.04915904</pub-id><pub-id pub-id-type="pmid">15537755</pub-id><pub-id pub-id-type="pmcid">2287316</pub-id></element-citation></ref>
<ref id="b60-ijmm-39-05-1063"><label>60</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lamour</surname><given-names>V</given-names></name><name><surname>Quevillon</surname><given-names>S</given-names></name><name><surname>Diriong</surname><given-names>S</given-names></name><name><surname>N'Guyen</surname><given-names>VC</given-names></name><name><surname>Lipinski</surname><given-names>M</given-names></name><name><surname>Mirande</surname><given-names>M</given-names></name></person-group><article-title>Evolution of the Glx-tRNA synthetase family: The glutaminyl enzyme as a case of horizontal gene transfer</article-title><source>Proc Natl Acad Sci USA</source><volume>91</volume><fpage>8670</fpage><lpage>8674</lpage><year>1994</year><pub-id pub-id-type="doi">10.1073/pnas.91.18.8670</pub-id><pub-id pub-id-type="pmid">8078941</pub-id><pub-id pub-id-type="pmcid">44668</pub-id></element-citation></ref>
<ref id="b61-ijmm-39-05-1063"><label>61</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hermann</surname><given-names>E</given-names></name><name><surname>Darcissac</surname><given-names>E</given-names></name><name><surname>Idziorek</surname><given-names>T</given-names></name><name><surname>Capron</surname><given-names>A</given-names></name><name><surname>Bahr</surname><given-names>GM</given-names></name></person-group><article-title>Recombinant interleukin-16 selectively modulates surface receptor expression and cytokine release in macrophages and dendritic cells</article-title><source>Immunology</source><volume>97</volume><fpage>241</fpage><lpage>248</lpage><year>1999</year><pub-id pub-id-type="doi">10.1046/j.1365-2567.1999.00786.x</pub-id><pub-id pub-id-type="pmid">10447738</pub-id><pub-id pub-id-type="pmcid">2326843</pub-id></element-citation></ref>
<ref id="b62-ijmm-39-05-1063"><label>62</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schatz</surname><given-names>G</given-names></name><name><surname>Dobberstein</surname><given-names>B</given-names></name></person-group><article-title>Common principles of protein translocation across membranes</article-title><source>Science</source><volume>271</volume><fpage>1519</fpage><lpage>1526</lpage><year>1996</year><pub-id pub-id-type="doi">10.1126/science.271.5255.1519</pub-id><pub-id pub-id-type="pmid">8599107</pub-id></element-citation></ref>
<ref id="b63-ijmm-39-05-1063"><label>63</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nakamura</surname><given-names>M</given-names></name><name><surname>Masuda</surname><given-names>H</given-names></name><name><surname>Horii</surname><given-names>J</given-names></name><name><surname>Kuma</surname><given-names>K</given-names></name><name><surname>Yokoyama</surname><given-names>N</given-names></name><name><surname>Ohba</surname><given-names>T</given-names></name><name><surname>Nishitani</surname><given-names>H</given-names></name><name><surname>Miyata</surname><given-names>T</given-names></name><name><surname>Tanaka</surname><given-names>M</given-names></name><name><surname>Nishimoto</surname><given-names>T</given-names></name></person-group><article-title>When overexpressed, a novel centrosomal protein, RanBPM, causes ectopic microtubule nucleation similar to gamma-tubulin</article-title><source>J Cell Biol</source><volume>143</volume><fpage>1041</fpage><lpage>1052</lpage><year>1998</year><pub-id pub-id-type="doi">10.1083/jcb.143.4.1041</pub-id><pub-id pub-id-type="pmid">9817760</pub-id><pub-id pub-id-type="pmcid">2132962</pub-id></element-citation></ref>
<ref id="b64-ijmm-39-05-1063"><label>64</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Varshavsky</surname><given-names>A</given-names></name></person-group><article-title>The N-end rule: Functions, mysteries, uses</article-title><source>Proc Natl Acad Sci USA</source><volume>93</volume><fpage>12142</fpage><lpage>12149</lpage><year>1996</year><pub-id pub-id-type="doi">10.1073/pnas.93.22.12142</pub-id><pub-id pub-id-type="pmid">8901547</pub-id><pub-id pub-id-type="pmcid">37957</pub-id></element-citation></ref>
<ref id="b65-ijmm-39-05-1063"><label>65</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rothermel</surname><given-names>B</given-names></name><name><surname>Vega</surname><given-names>RB</given-names></name><name><surname>Yang</surname><given-names>J</given-names></name><name><surname>Wu</surname><given-names>H</given-names></name><name><surname>Bassel-Duby</surname><given-names>R</given-names></name><name><surname>Williams</surname><given-names>RS</given-names></name></person-group><article-title>A protein encoded within the Down syndrome critical region is enriched in striated muscles and inhibits calcineurin signaling</article-title><source>J Biol Chem</source><volume>275</volume><fpage>8719</fpage><lpage>8725</lpage><year>2000</year><pub-id pub-id-type="doi">10.1074/jbc.275.12.8719</pub-id><pub-id pub-id-type="pmid">10722714</pub-id></element-citation></ref>
<ref id="b66-ijmm-39-05-1063"><label>66</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Strippoli</surname><given-names>P</given-names></name><name><surname>Petrini</surname><given-names>M</given-names></name><name><surname>Lenzi</surname><given-names>L</given-names></name><name><surname>Carinci</surname><given-names>P</given-names></name><name><surname>Zannotti</surname><given-names>M</given-names></name></person-group><article-title>The murine DSCR1-like (Down syndrome candidate region 1) gene family: Conserved synteny with the human orthologous genes</article-title><source>Gene</source><volume>257</volume><fpage>223</fpage><lpage>232</lpage><year>2000</year><pub-id pub-id-type="doi">10.1016/S0378-1119(00)00407-8</pub-id><pub-id pub-id-type="pmid">11080588</pub-id></element-citation></ref>
<ref id="b67-ijmm-39-05-1063"><label>67</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Frabetti</surname><given-names>F</given-names></name><name><surname>Huntsman</surname><given-names>SA</given-names></name><name><surname>Canaider</surname><given-names>S</given-names></name><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Lenzi</surname><given-names>L</given-names></name><name><surname>Facchin</surname><given-names>F</given-names></name><name><surname>Carinci</surname><given-names>P</given-names></name><name><surname>Zannotti</surname><given-names>M</given-names></name><name><surname>Coppola</surname><given-names>D</given-names></name><etal/></person-group><article-title>Sequence, 'subtle' alternative splicing and expression of the CYYR1 (cysteine/tyrosine-rich 1) mRNA in human neuroendocrine tumors</article-title><source>BMC Cancer</source><volume>7</volume><fpage>66</fpage><year>2007</year><pub-id pub-id-type="doi">10.1186/1471-2407-7-66</pub-id></element-citation></ref>
<ref id="b68-ijmm-39-05-1063"><label>68</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Facchin</surname><given-names>F</given-names></name><name><surname>Canaider</surname><given-names>S</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Frabetti</surname><given-names>F</given-names></name><name><surname>Griffoni</surname><given-names>C</given-names></name><name><surname>Lenzi</surname><given-names>L</given-names></name><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name></person-group><article-title>Identification and analysis of human RCAN3 (DSCR1L2) mRNA and protein isoforms</article-title><source>Gene</source><volume>407</volume><fpage>159</fpage><lpage>168</lpage><year>2008</year><pub-id pub-id-type="doi">10.1016/j.gene.2007.10.006</pub-id></element-citation></ref>
<ref id="b69-ijmm-39-05-1063"><label>69</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Facchin</surname><given-names>F</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Bianconi</surname><given-names>E</given-names></name><name><surname>Piva</surname><given-names>F</given-names></name><name><surname>Frabetti</surname><given-names>F</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Canaider</surname><given-names>S</given-names></name></person-group><article-title>Complexity of bidirectional transcription and alternative splicing at human RCAN3 locus</article-title><source>PLoS One</source><volume>6</volume><fpage>e24508</fpage><year>2011</year><pub-id pub-id-type="doi">10.1371/journal.pone.0024508</pub-id><pub-id pub-id-type="pmid">21961037</pub-id><pub-id pub-id-type="pmcid">3178534</pub-id></element-citation></ref>
<ref id="b70-ijmm-39-05-1063"><label>70</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Casadei</surname><given-names>R</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Facchin</surname><given-names>F</given-names></name><name><surname>Canaider</surname><given-names>S</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name><name><surname>Vian</surname><given-names>M</given-names></name><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Bianconi</surname><given-names>E</given-names></name><name><surname>Mariani</surname><given-names>E</given-names></name><etal/></person-group><article-title>Characterization of human gene locus CYYR1: A complex multi-transcript system</article-title><source>Mol Biol Rep</source><volume>41</volume><fpage>6025</fpage><lpage>6038</lpage><year>2014</year><pub-id pub-id-type="doi">10.1007/s11033-014-3480-3</pub-id><pub-id pub-id-type="pmid">24981926</pub-id></element-citation></ref>
<ref id="b71-ijmm-39-05-1063"><label>71</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nagase</surname><given-names>T</given-names></name><name><surname>Seki</surname><given-names>N</given-names></name><name><surname>Ishikawa</surname><given-names>K</given-names></name><name><surname>Tanaka</surname><given-names>A</given-names></name><name><surname>Nomura</surname><given-names>N</given-names></name></person-group><article-title>Prediction of the coding sequences of unidentified human genes. V The coding sequences of 40 new genes (KIAA0161-KIAA0200) deduced by analysis of cDNA clones from human cell line KG-1</article-title><source>DNA Res</source><volume>3</volume><fpage>17</fpage><lpage>24</lpage><year>1996</year><pub-id pub-id-type="doi">10.1093/dnares/3.1.17</pub-id><pub-id pub-id-type="pmid">8724849</pub-id></element-citation></ref>
<ref id="b72-ijmm-39-05-1063"><label>72</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ribieras</surname><given-names>S</given-names></name><name><surname>Lef&#x000E8;bvre</surname><given-names>O</given-names></name><name><surname>Tomasetto</surname><given-names>C</given-names></name><name><surname>Rio</surname><given-names>MC</given-names></name></person-group><article-title>Mouse Trefoil factor genes: Genomic organization, sequences and methylation analyses</article-title><source>Gene</source><volume>266</volume><fpage>67</fpage><lpage>75</lpage><year>2001</year><pub-id pub-id-type="doi">10.1016/S0378-1119(01)00380-8</pub-id><pub-id pub-id-type="pmid">11290420</pub-id></element-citation></ref>
<ref id="b73-ijmm-39-05-1063"><label>73</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Doglio</surname><given-names>L</given-names></name><name><surname>Goode</surname><given-names>DK</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Pauls</surname><given-names>S</given-names></name><name><surname>Frabetti</surname><given-names>F</given-names></name><name><surname>Shimeld</surname><given-names>SM</given-names></name><name><surname>Vavouri</surname><given-names>T</given-names></name><name><surname>Elgar</surname><given-names>G</given-names></name></person-group><article-title>Parallel evolution of chordate cis-regulatory code for development</article-title><source>PLoS Genet</source><volume>9</volume><fpage>e1003904</fpage><year>2013</year><pub-id pub-id-type="doi">10.1371/journal.pgen.1003904</pub-id><pub-id pub-id-type="pmid">24282393</pub-id><pub-id pub-id-type="pmcid">3836708</pub-id></element-citation></ref>
<ref id="b74-ijmm-39-05-1063"><label>74</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hinnebusch</surname><given-names>AG</given-names></name><name><surname>Ivanov</surname><given-names>IP</given-names></name><name><surname>Sonenberg</surname><given-names>N</given-names></name></person-group><article-title>Translational control by 5&#x02032;-untranslated regions of eukaryotic mRNAs</article-title><source>Science</source><volume>352</volume><fpage>1413</fpage><lpage>1416</lpage><year>2016</year><pub-id pub-id-type="doi">10.1126/science.aad9868</pub-id><pub-id pub-id-type="pmid">27313038</pub-id></element-citation></ref>
<ref id="b75-ijmm-39-05-1063"><label>75</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Caracausi</surname><given-names>M</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Bruno</surname><given-names>S</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name></person-group><article-title>A quantitative transcriptome reference map of the normal human brain</article-title><source>Neurogenetics</source><volume>15</volume><fpage>267</fpage><lpage>287</lpage><year>2014</year><pub-id pub-id-type="doi">10.1007/s10048-014-0419-8</pub-id><pub-id pub-id-type="pmid">25185649</pub-id></element-citation></ref>
<ref id="b76-ijmm-39-05-1063"><label>76</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Caracausi</surname><given-names>M</given-names></name><name><surname>Berardi</surname><given-names>AC</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name></person-group><article-title>Integrated differential transcriptome maps of Acute Megakaryoblastic Leukemia (AMKL) in children with or without Down Syndrome (DS)</article-title><source>BMC Med Genomics</source><volume>7</volume><fpage>63</fpage><year>2014</year><pub-id pub-id-type="doi">10.1186/s12920-014-0063-z</pub-id><pub-id pub-id-type="pmid">25476127</pub-id><pub-id pub-id-type="pmcid">4304173</pub-id></element-citation></ref>
<ref id="b77-ijmm-39-05-1063"><label>77</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Manning</surname><given-names>AG</given-names></name><name><surname>Crawford</surname><given-names>BD</given-names></name><name><surname>Waskiewicz</surname><given-names>AJ</given-names></name><name><surname>Pilgrim</surname><given-names>DB</given-names></name></person-group><article-title>unc-119 homolog required for normal development of the zebrafish nervous system</article-title><source>Genesis</source><volume>40</volume><fpage>223</fpage><lpage>230</lpage><year>2004</year><pub-id pub-id-type="doi">10.1002/gene.20089</pub-id><pub-id pub-id-type="pmid">15593328</pub-id></element-citation></ref>
<ref id="b78-ijmm-39-05-1063"><label>78</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Strippoli</surname><given-names>P</given-names></name></person-group><article-title>Universal tight correlation of codon bias and pool of RNA codons (codonome): The genome is optimized to allow any distribution of gene expression values in the transcriptome from bacteria to humans</article-title><source>Genomics</source><volume>101</volume><fpage>282</fpage><lpage>289</lpage><year>2013</year><pub-id pub-id-type="doi">10.1016/j.ygeno.2013.02.009</pub-id><pub-id pub-id-type="pmid">23466472</pub-id></element-citation></ref>
<ref id="b79-ijmm-39-05-1063"><label>79</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Komar</surname><given-names>AA</given-names></name></person-group><article-title>The Yin and Yang of codon usage</article-title><source>Hum Mol Genet</source><volume>25</volume><issue>R2</issue><fpage>R77</fpage><lpage>R85</lpage><year>2016</year><pub-id pub-id-type="doi">10.1093/hmg/ddw207</pub-id><pub-id pub-id-type="pmid">27354349</pub-id></element-citation></ref>
<ref id="b80-ijmm-39-05-1063"><label>80</label><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Piovesan</surname><given-names>A</given-names></name><name><surname>Caracausi</surname><given-names>M</given-names></name><name><surname>Antonaros</surname><given-names>F</given-names></name><name><surname>Pelleri</surname><given-names>MC</given-names></name><name><surname>Vitale</surname><given-names>L</given-names></name></person-group><article-title>GeneBase 11: A tool to summarise data from NCBI gene datasets and its application to an update of human gene statistics</article-title><source>Database (Oxford)</source><volume>2016</volume><comment>pii: baw153</comment><year>2016</year><pub-id pub-id-type="doi">10.1093/database/baw153</pub-id></element-citation></ref>
<ref id="b81-ijmm-39-05-1063"><label>81</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ahsan</surname><given-names>B</given-names></name><name><surname>Saito</surname><given-names>TL</given-names></name><name><surname>Hashimoto</surname><given-names>S</given-names></name><name><surname>Muramatsu</surname><given-names>K</given-names></name><name><surname>Tsuda</surname><given-names>M</given-names></name><name><surname>Sasaki</surname><given-names>A</given-names></name><name><surname>Matsushima</surname><given-names>K</given-names></name><name><surname>Aigaki</surname><given-names>T</given-names></name><name><surname>Morishita</surname><given-names>S</given-names></name></person-group><article-title>MachiBase: A Drosophila melanogaster 5&#x02032;-end mRNA transcription database</article-title><source>Nucleic Acids Res</source><volume>37</volume><issue>Database</issue><fpage>D49</fpage><lpage>D53</lpage><year>2009</year><pub-id pub-id-type="doi">10.1093/nar/gkn694</pub-id></element-citation></ref>
<ref id="b82-ijmm-39-05-1063"><label>82</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Machida</surname><given-names>RJ</given-names></name><name><surname>Lin</surname><given-names>YY</given-names></name></person-group><article-title>Four methods of preparing mRNA 5&#x02032; end libraries using the Illumina sequencing platform</article-title><source>PLoS One</source><volume>9</volume><fpage>e101812</fpage><year>2014</year><pub-id pub-id-type="doi">10.1371/journal.pone.0101812</pub-id></element-citation></ref>
<ref id="b83-ijmm-39-05-1063"><label>83</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Helbig</surname><given-names>AO</given-names></name><name><surname>Gauci</surname><given-names>S</given-names></name><name><surname>Raijmakers</surname><given-names>R</given-names></name><name><surname>van Breukelen</surname><given-names>B</given-names></name><name><surname>Slijper</surname><given-names>M</given-names></name><name><surname>Mohammed</surname><given-names>S</given-names></name><name><surname>Heck</surname><given-names>AJ</given-names></name></person-group><article-title>Profiling of N-acetylated protein termini provides in-depth insights into the N-terminal nature of the proteome</article-title><source>Mol Cell Proteomics</source><volume>9</volume><fpage>928</fpage><lpage>939</lpage><year>2010</year><pub-id pub-id-type="doi">10.1074/mcp.M900463-MCP200</pub-id><pub-id pub-id-type="pmid">20061308</pub-id><pub-id pub-id-type="pmcid">2871424</pub-id></element-citation></ref>
<ref id="b84-ijmm-39-05-1063"><label>84</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Doucet</surname><given-names>A</given-names></name><name><surname>Overall</surname><given-names>CM</given-names></name></person-group><article-title>Amino-Terminal Oriented Mass Spectrometry of Substrates (ATOMS) N-terminal sequencing of proteins and proteolytic cleavage sites by quantitative mass spectrometry</article-title><source>Methods Enzymol</source><volume>501</volume><fpage>275</fpage><lpage>293</lpage><year>2011</year><pub-id pub-id-type="doi">10.1016/B978-0-12-385950-1.00013-4</pub-id><pub-id pub-id-type="pmid">22078539</pub-id></element-citation></ref></ref-list></back>
<floats-group>
<fig id="f1-ijmm-39-05-1063" position="float">
<label>Figure 1</label>
<caption>
<p>The 5&#x02032; end mRNA artifact. cDNA is typically obtained through a primer starting polymerization from the 3&#x02032; region of the mRNA by reverse transcriptase. The natural processivity of the enzyme, as well as its quality, the integrity of the RNA and the secondary structures assumed by the 5&#x02032; region of the mRNA may hamper the reverse transcriptase progression, causing a failure in the polymerization of the first-strand cDNA along the full length of the mRNA template toward its 5&#x02032; end, affecting all further experiments, including the assignment of the first AUG codon. ss, single-stranded; ds, double-stranded.</p></caption>
<graphic xlink:href="IJMM-39-05-1063-g00.tif"/></fig>
<fig id="f2-ijmm-39-05-1063" position="float">
<label>Figure 2</label>
<caption>
<p>Identification and correction of incomplete 5&#x02032; end regions. Possible EST sequence candidates for extending the known mRNA 5&#x02032;-coding region are selected for the presence of an upstream in-frame AUG codon and absence of any stop codon between the previously known and the newly determined AUG codons. The upstream in-frame AUG codon becomes the actual translation start codon, thus encoding for a new Met and extending the predicted amino terminus sequence of the mRNA product. EST, expressed sequence tag; Met, methionine.</p></caption>
<graphic xlink:href="IJMM-39-05-1063-g01.tif"/></fig>
<table-wrap id="tI-ijmm-39-05-1063" position="float">
<label>Table I</label>
<caption>
<p>Main published results of systematic search for completeness of mRNA 5&#x02032; CDS region.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="middle" align="left">Ref.</th>
<th valign="middle" align="left">Year</th>
<th valign="middle" align="center">Organism</th>
<th valign="middle" align="center">Method</th>
<th valign="middle" align="center">mRNAs</th>
<th valign="middle" align="center">Extended 5&#x02032; CDS<xref rid="tfn1-ijmm-39-05-1063" ref-type="table-fn">a</xref></th></tr></thead>
<tbody>
<tr>
<td valign="top" align="left">(<xref ref-type="bibr" rid="b35-ijmm-39-05-1063">35</xref>)</td>
<td valign="top" align="left">2000</td>
<td valign="top" align="left"><italic>H. sapiens</italic></td>
<td valign="top" align="left">Oligo-capping</td>
<td valign="top" align="right">954</td>
<td valign="top" align="right">68 (7.1%)</td></tr>
<tr>
<td valign="top" align="left">(<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>)</td>
<td valign="top" align="left">2003</td>
<td valign="top" align="left"><italic>H. sapiens</italic></td>
<td valign="top" align="left">Manual and automated sequence analysis</td>
<td valign="top" align="right">13,124</td>
<td valign="top" align="right">556 (4.2%)</td></tr>
<tr>
<td valign="top" align="left">(<xref ref-type="bibr" rid="b53-ijmm-39-05-1063">53</xref>)</td>
<td valign="top" align="left">2007</td>
<td valign="top" align="left"><italic>D. rerio</italic></td>
<td valign="top" align="left">Automated sequence analysis</td>
<td valign="top" align="right">8,528</td>
<td valign="top" align="right">285 (3.3%)</td></tr>
<tr>
<td valign="top" align="left">(<xref ref-type="bibr" rid="b39-ijmm-39-05-1063">39</xref>)</td>
<td valign="top" align="left">2011</td>
<td valign="top" align="left">Mouse embrionic stem cells</td>
<td valign="top" align="left">Ribosome footprinting profiling and support vector machine (SVM)-based machine learning strategy</td>
<td valign="top" align="right">4,994</td>
<td valign="top" align="right">570 (11.4%)</td></tr>
<tr>
<td valign="top" align="left">(<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>)</td>
<td valign="top" align="left">2012</td>
<td valign="top" align="left"><italic>H. sapiens</italic></td>
<td valign="top" align="left">Fully automated sequence analysis</td>
<td valign="top" align="right">18,665</td>
<td valign="top" align="right">477 (2.6%)</td></tr>
<tr>
<td valign="top" align="left">(<xref ref-type="bibr" rid="b40-ijmm-39-05-1063">40</xref>)</td>
<td valign="top" align="left">2012</td>
<td valign="top" align="left"><italic>H. sapiens</italic></td>
<td valign="top" align="left">Ribosome footprinting profiling and neural network prediction</td>
<td valign="top" align="right">5,062</td>
<td valign="top" align="right">6 AUG (0.1%) and 540 non-AUG (10.7%)</td></tr>
<tr>
<td valign="top" align="left">(<xref ref-type="bibr" rid="b55-ijmm-39-05-1063">55</xref>)</td>
<td valign="top" align="left">2014</td>
<td valign="top" align="left"><italic>M. musculus</italic></td>
<td valign="top" align="left">Fully automated sequence analysis</td>
<td valign="top" align="right">20,221</td>
<td valign="top" align="right">351 (1.7%)</td></tr>
<tr>
<td valign="top" align="left">(<xref ref-type="bibr" rid="b41-ijmm-39-05-1063">41</xref>)</td>
<td valign="top" align="left">2014</td>
<td valign="top" align="left"><italic>H. sapiens</italic></td>
<td valign="top" align="left">Ribosome footprinting profiling and manual analysis</td>
<td valign="top" align="right">1,255</td>
<td valign="top" align="right">17 (1.4%)</td></tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left"/>
<td valign="top" align="left"><italic>M. musculus</italic></td>
<td valign="top" align="left">Ribosome footprinting profiling and manual analysis</td>
<td valign="top" align="right">930</td>
<td valign="top" align="right">4 AUG (0.4%) and 13 non-AUG (1.4%)</td></tr></tbody></table>
<table-wrap-foot><fn id="tfn1-ijmm-39-05-1063">
<label>a</label>
<p>Estimation. CDS, coding sequence; H. sapiens, Homo sapiens; D. rerio, Danio rerio (zebrafish); M. musculus, Mus musculus (mouse).</p></fn></table-wrap-foot></table-wrap>
<table-wrap id="tII-ijmm-39-05-1063" position="float">
<label>Table II</label>
<caption>
<p>Possible consequences of incomplete determination of mRNA 5&#x02032; CDS region for example human genes.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="middle" align="left"/>
<th valign="middle" align="center">Symbol</th>
<th valign="middle" align="center">Ref.</th>
<th valign="middle" align="center">AAs<xref rid="tfn2-ijmm-39-05-1063" ref-type="table-fn">a</xref></th>
<th valign="middle" align="center">Ref. <xref rid="b2-ijmm-39-05-1063" ref-type="bibr">2</xref></th></tr></thead>
<tbody>
<tr>
<td valign="top" align="left">At protein level</td>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/></tr>
<tr>
<td valign="top" align="left">&#x02003;Errors in determining the 3D protein structure</td>
<td valign="top" align="center">ALDOC</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b59-ijmm-39-05-1063">59</xref>)</td>
<td valign="top" align="center">87</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Prediction of an incomplete polypeptide</td>
<td valign="top" align="center">QARS</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b60-ijmm-39-05-1063">60</xref>)</td>
<td valign="top" align="center">18</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Production of an incomplete polypeptide</td>
<td valign="top" align="center">IL16</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b61-ijmm-39-05-1063">61</xref>)</td>
<td valign="top" align="center">47</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Lack of description of functional protein domains</td>
<td valign="top" align="center">SON</td>
<td valign="top" align="center"><ext-link xlink:href="http://www.ncbi.nlm.nih.gov/gene/6651" ext-link-type="uri">http://www.ncbi.nlm.nih.gov/gene/6651</ext-link></td>
<td valign="top" align="center">968</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Errors in identifying protein localization</td>
<td valign="top" align="center">RANBP9/RanBPM</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b63-ijmm-39-05-1063">63</xref>)</td>
<td valign="top" align="center">230</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b44-ijmm-39-05-1063">44</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Failure to predict alternative polypeptides</td>
<td valign="top" align="center">UMOD</td>
<td valign="top" align="center"><ext-link xlink:href="http://www.ncbi.nlm.nih.gov/gene/7369" ext-link-type="uri">http://www.ncbi.nlm.nih.gov/gene/7369</ext-link></td>
<td valign="top" align="center">49 or 28</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Errors in identifying ortholog products</td>
<td valign="top" align="center">DSCR1.1</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b66-ijmm-39-05-1063">66</xref>)</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>)</td></tr>
<tr>
<td colspan="5" align="left" valign="bottom">
<hr/></td></tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Symbol</td>
<td valign="top" align="center">Ref.</td>
<td valign="top" align="center">nts<xref rid="tfn2-ijmm-39-05-1063" ref-type="table-fn">a</xref></td>
<td valign="top" align="center">Ref. <xref rid="b2-ijmm-39-05-1063" ref-type="bibr">2</xref></td></tr>
<tr>
<td colspan="5" align="left" valign="bottom">
<hr/></td></tr>
<tr>
<td valign="top" align="left">At cDNA level</td>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/></tr>
<tr>
<td valign="top" align="left">&#x02003;Failure to screen the complete CDS for mutations</td>
<td valign="top" align="center"><italic>ADAR</italic></td>
<td valign="top" align="center"><ext-link xlink:href="http://omim.org/entry/146920" ext-link-type="uri">http://omim.org/entry/146920</ext-link></td>
<td valign="top" align="center">48</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b54-ijmm-39-05-1063">54</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Incomplete cDNA in two-hybrid test for function</td>
<td valign="top" align="center"><italic>DSCR1</italic></td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b65-ijmm-39-05-1063">65</xref>)</td>
<td valign="top" align="center">55</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Potential errors in designing morpholino oligos</td>
<td valign="top" align="center"><italic>unc-119.2</italic> (<italic>Danio rerio</italic>)</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b77-ijmm-39-05-1063">77</xref>)</td>
<td valign="top" align="center">58</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b53-ijmm-39-05-1063">53</xref>)</td></tr>
<tr>
<td valign="top" align="left">At gene structure level</td>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/>
<td valign="top" align="center"/></tr>
<tr>
<td valign="top" align="left">&#x02003;Failure to identify the full extension of the gene/labeling of genic regions as intergenic space</td>
<td valign="top" align="center"><italic>DIP2A</italic></td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b71-ijmm-39-05-1063">71</xref>)</td>
<td valign="top" align="center">82,895</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>)</td></tr>
<tr>
<td valign="top" align="left">&#x02003;Failure to identify actual promoter regions</td>
<td valign="top" align="center"><italic>TFF3</italic></td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b72-ijmm-39-05-1063">72</xref>)</td>
<td valign="top" align="center">170</td>
<td valign="top" align="center">(<xref ref-type="bibr" rid="b29-ijmm-39-05-1063">29</xref>)</td></tr></tbody></table>
<table-wrap-foot><fn id="tfn2-ijmm-39-05-1063">
<label>a</label>
<p>AAs or nts added to the previously recorded protein or nucleic acid sequence, respectively, following the analysis cited as Ref. <xref rid="b2-ijmm-39-05-1063" ref-type="bibr">2</xref>. CDS, coding sequence; AAs, amino acids; nts, nucleotides; ALDOC, Aldolase, Fructose-Bisphosphate C; QARS, glutaminyl-tRNA synthetase; IL16, interleukin 16; SON, SON DNA binding protein; RANBP9, RAN Binding protein 9; UMOD, uromodulin; DSCR1, down syndrome critical region 1; <italic>ADAR</italic>, adenosine deaminase, RNA specific; <italic>DIP2A</italic>, disco interacting protein 2 homolog A; <italic>TFF3</italic>, trefoil factor 3.</p></fn></table-wrap-foot></table-wrap></floats-group></article>
