<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="en" article-type="review-article">
<?release-delay 0|0?>
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">WASJ</journal-id>
<journal-title-group>
<journal-title>World Academy of Sciences Journal</journal-title>
</journal-title-group>
<issn pub-type="ppub">2632-2900</issn>
<issn pub-type="epub">2632-2919</issn>
<publisher>
<publisher-name>D.A. Spandidos</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">WASJ-7-2-00315</article-id>
<article-id pub-id-type="doi">10.3892/wasj.2025.315</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Review</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Single‑cell RNA sequencing data dimensionality reduction (Review)</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zogopoulos</surname><given-names>Vasileios L.</given-names></name>
<xref rid="af1-WASJ-7-2-00315" ref-type="aff">1</xref>
<xref rid="af2-WASJ-7-2-00315" ref-type="aff">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Tsotra</surname><given-names>Ioanna</given-names></name>
<xref rid="af1-WASJ-7-2-00315" ref-type="aff">1</xref>
<xref rid="af2-WASJ-7-2-00315" ref-type="aff">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Spandidos</surname><given-names>Demetrios A.</given-names></name>
<xref rid="af3-WASJ-7-2-00315" ref-type="aff">3</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Iconomidou</surname><given-names>Vassiliki A.</given-names></name>
<xref rid="af2-WASJ-7-2-00315" ref-type="aff">2</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Michalopoulos</surname><given-names>Ioannis</given-names></name>
<xref rid="af1-WASJ-7-2-00315" ref-type="aff">1</xref>
<xref rid="c1-WASJ-7-2-00315" ref-type="corresp"/>
</contrib>
</contrib-group>
<aff id="af1-WASJ-7-2-00315"><label>1</label>Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece</aff>
<aff id="af2-WASJ-7-2-00315"><label>2</label>Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, 15701 Athens, Greece</aff>
<aff id="af3-WASJ-7-2-00315"><label>3</label>Laboratory of Clinical Virology, Medical School, University of Crete, 71003 Heraklion, Greece</aff>
<author-notes>
<corresp id="c1-WASJ-7-2-00315"><italic>Correspondence to:</italic> Dr Ioannis Michalopoulos, Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, 11527 Athens, Greece <email>imichalop@bioacademy.gr yhkuang0412@163.com </email></corresp>
<fn><p><italic>Abbreviations:</italic> GAN, generative adversarial network; PC, principal component; PCA, principal component analysis; scRNA-Seq, single-cell RNA sequencing; t-SNE, t-distributed stochastic neighbour embedding; UMAP, uniform manifold approximation and projection; UMI, unique molecular identifier; VAE, variational autoencoder</p></fn>
</author-notes>
<pub-date pub-type="collection">
<season>Mar-Apr</season>
<year>2025</year></pub-date>
<pub-date pub-type="epub">
<day>20</day>
<month>01</month>
<year>2025</year></pub-date>
<volume>7</volume>
<issue>2</issue>
<elocation-id>27</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>11</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>01</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: © 2025 Zogopoulos et al.</copyright-statement>
<copyright-year>2025</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.</license-p></license>
</permissions>
<abstract>
<p>Single-cell RNA sequencing (scRNA-Seq) provides detailed insight into gene expression at the individual cell level, revealing hidden cell diversity. However, scRNA-Seq data pose challenges due to high-dimensionality and sparsity. High-dimensionality stems from analysing numerous cells and genes, while sparsity arises from zero counts in gene expression data, known as dropout events. This necessitates robust data processing methods of the scRNA-Seq gene counts, for meaningful interpretation. Dimensionality reduction techniques, such as principal component analysis, transform gene count data into lower-dimensional spaces retaining biological information, aiding in downstream analyses, while dimensionality reduction-based visualisation methods, such as t-distributed stochastic neighbour embedding, and uniform manifold approximation and projection are used for cell or gene clustering. Deep learning techniques, such as variational autoencoders and generative adversarial networks compress data and generate synthetic gene expression profiles, augmenting datasets and improving utility in biomedical research. In recent years, the interest for scRNA-Seq dimensionality reduction has markedly increased, not only leading to the development of a multitude of methods, but also to the integration of these approaches into scRNA-Seq data processing pipelines. The present review aimed to list and explain, in layman's terms, the current popular dimensionality reduction methods, as well as include advancements and software package implementations of them.</p>
</abstract>
<kwd-group>
<kwd>scRNA-Seq</kwd>
<kwd>dimensionality reduction</kwd>
<kwd>VAE</kwd>
<kwd>GAN</kwd>
<kwd>UMAP</kwd>
<kwd>t-SNE</kwd>
<kwd>PCA</kwd>
</kwd-group>
<funding-group>
<funding-statement><bold>Funding:</bold> No funding was received.</funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec>
<title>1. Single-cell RNA-Seq</title>
<p>The transcriptome is the set of all RNA transcripts of a cell/tissue of an organism, as well as their quantity (<xref rid="b1-WASJ-7-2-00315" ref-type="bibr">1</xref>). The two main transcriptomic technologies used to obtain gene expression data are microarrays (<xref rid="b2-WASJ-7-2-00315" ref-type="bibr">2</xref>) and RNA sequencing (RNA-Seq) (<xref rid="b1-WASJ-7-2-00315" ref-type="bibr">1</xref>). The latter can be divided into bulk and single-cell RNA-Seq (scRNA-Seq). Bulk RNA-Seq, as the first iteration of this technology, uses the total mRNA extracted from a tissue, providing an average expression for each gene in the variety of cells included in a sample. On the other hand, scRNA-Seq is an emerging RNA-Seq technology which investigates the transcriptome of single cells (<xref rid="b3-WASJ-7-2-00315" ref-type="bibr">3</xref>). Despite the large amount of different sequencing platforms, the main experimental workflow of scRNA-Seq includes the following steps: i) Single-cell isolation from the tissue of interest; ii) lysis of cells and RNA isolation; iii) reverse transcription of the mRNA and amplification through PCR; and iv) library preparation and sequencing (<xref rid="b4-WASJ-7-2-00315" ref-type="bibr">4</xref>). Independent of the sequencing platform used, the final output is a FASTQ file, which constitutes the scRNA-Seq raw data, containing the nucleotide sequence, as well as a PHRED quality score for each base (<xref rid="b5-WASJ-7-2-00315" ref-type="bibr">5</xref>). FASTQ file generation is followed by their computational pre-processing, resulting in the production of a gene expression matrix, usually in the form of gene read count or unique molecular identifier (UMI) (<xref rid="b6-WASJ-7-2-00315" ref-type="bibr">6</xref>) matrix in the case of droplet-based platforms (e.g., 10x Genomics Chromium); the latter was introduced to cater for PCR bias and ensure accurate gene expression quantification. The pipeline for the mapping of reads to the reference genome is in principle the same as in bulk RNA-Seq, including the following basic steps: i) Quality control and adapter sequence removal; ii) alignment of reads to the reference genome; iii) feature count; and iv) normalisation (<xref rid="b7-WASJ-7-2-00315" ref-type="bibr">7</xref>,<xref rid="b8-WASJ-7-2-00315" ref-type="bibr">8</xref>). However, in the case of single-cell data, further preprocessing steps are included, to account for the intricacies of single-cell sequencing, performed by specialised software. These steps include the identification of low-quality cells, count transformation for UMI datasets, the identification of highly variable features (genes), dimensionality reduction, cell clustering, etc (<xref rid="b9-WASJ-7-2-00315" ref-type="bibr">9</xref>). Existing pipelines for the pre-processing of scRNA-Seq data, such as Cell Ranger (<xref rid="b10-WASJ-7-2-00315" ref-type="bibr">10</xref>) for 10x Genomics-based data, have already been established in the scientific community.</p>
<p>scRNA-Seq allows for the high-resolution study of gene expressions in a cell-specific manner. However, scRNA-Seq gene count data are characterised by high dimensionality, due to the high number of cells that are isolated from an extracted tissue and the high number of genes (both coding and non-coding) that are studied (<xref rid="b11-WASJ-7-2-00315" ref-type="bibr">11</xref>). Furthermore, gene expression levels derived from scRNA-Seq demonstrate high sparsity due to the appearance of a large amount of zero counts of genes (known as ‘dropout events’) that are truly expressed in other cells of the same type. Dropout events may be attributed to the low levels of mRNA which are extracted from each cell, the stochasticity of gene expression and the cell-specific expression of certain genes (<xref rid="b12-WASJ-7-2-00315" ref-type="bibr">12</xref>). In order to deal with those two major drawbacks of single-cell data, statistical and artificial intelligence methods of dimensionality reduction and imputation, have been developed. Furthermore, certain dimensionality reduction methods also cater for the imputation of zero values (<xref rid="b13-WASJ-7-2-00315" ref-type="bibr">13</xref>). Nevertheless, the sparsity inherent in scRNA-Seq data, can be overcome using just dimensionality reduction, as the compression to a low-dimension space results in the combination of expression data in the various cells and naturally deals with data redundancy (<xref rid="b14-WASJ-7-2-00315" ref-type="bibr">14</xref>). The present review mainly focuses on the available and most commonly used methods which are used to perform dimensionality reduction on scRNA-Seq gene count data.</p>
</sec>
<sec>
<title>2. Dimensionality reduction</title>
<p>In the context of scRNA-Seq data, each cell may be represented as a data point in a Euclidean space with as many dimensions as the number of genes in the dataset and the coordinates of the data point are the expressions of the genes in the cell. Vice versa, each gene may also be depicted as a data point in a high-dimensional space, whose dimensions are as many as the cell number, and the point coordinates are the gene expression levels in each cell. Consequently, scRNA-Seq count data, albeit represented as a two-dimensional text file with columns (cells) and rows (genes), are actually multidimensional.</p>
<p>Dimensionality reduction refers to the transformation of high-dimensional data to lower-dimensions, reducing their size while keeping most of the information present in the original data (<xref rid="b15-WASJ-7-2-00315" ref-type="bibr">15</xref>). As the amount of computational resources required to run any algorithm (e.g., for machine learning) depends on the size of the input data, reducing their dimensions results in lower memory requirements and shorter execution times (<xref rid="b16-WASJ-7-2-00315" ref-type="bibr">16</xref>).</p>
<p>There are two approaches for dimensionality reduction: Feature selection and feature extraction, where features refer to the dataset dimensions (genes or samples). In feature selection, a certain number of dimensions that provide the most significant information are selected, while the remainder are discarded. Feature extraction focuses on creating a new set of dimensions by combining the original dimensions (<xref rid="b15-WASJ-7-2-00315" ref-type="bibr">15</xref>,<xref rid="b17-WASJ-7-2-00315" ref-type="bibr">17</xref>).</p>
<p>As high-dimensionality in scRNA-Seq is attributed to both the samples and genes, dimensionality reduction can be performed for any of the two, usually through feature extraction. In this case, the reduction of the dimensionality of genes in scRNA-Seq data creates a smaller set of latent genes, enabling the efficient clustering of cells, and the subsequent identification of cell types, a step which constitutes an essential part of most scRNA-Seq analyses (<xref rid="b18-WASJ-7-2-00315" ref-type="bibr">18</xref>). On the other hand, reducing the dimensionality of cells, through the creation of latent samples that contain most of the biological information of the original cells (<xref rid="f1-WASJ-7-2-00315" ref-type="fig">Fig. 1</xref>), facilitates dataset integration for differential gene expression analysis (<xref rid="b19-WASJ-7-2-00315" ref-type="bibr">19</xref>). Dimensionality reduction has been established as an integral part in the scRNA-Seq data processing pipeline for bringing the data to a more manageable form before being used in further downstream analysis or data visualisation (<xref rid="b20-WASJ-7-2-00315" ref-type="bibr">20</xref>) (<xref rid="f2-WASJ-7-2-00315" ref-type="fig">Fig. 2</xref>).</p>
</sec>
<sec>
<title>3. Common dimensionality reduction techniques in single-cell RNA-Seq</title>
<sec>
<title/>
<sec>
<title>Principal component (PC) analysis (PCA)</title>
<p>PCA is a statistical method used to reduce high-dimensional data (such as scRNA-Seq data) into lower dimensions, while retaining most of the original data information (<xref rid="b21-WASJ-7-2-00315" ref-type="bibr">21</xref>). PCA is an orthogonal linear transformation of the data points of the original dataset (<xref rid="b22-WASJ-7-2-00315" ref-type="bibr">22</xref>), creating new variables known as PCs that are unrelated amongst themselves and each PC captures decreasing proportions of the total variance of the original dataset (<xref rid="b23-WASJ-7-2-00315" ref-type="bibr">23</xref>). There are several approaches to detect the number of PCs that need to be kept in order to retain most of the variability of the original dataset, while excluding variability that is caused by noise. One of the most commonly used methods is keeping the top PCs that explain an arbitrarily selected percentage of variability, although that may include a large number of PCs that explain variability that is attributed to noise. On the other hand, the PCs and the variability of the dataset they explain can be plotted and the top ones can be selected using the ‘elbow’ method (<xref rid="b24-WASJ-7-2-00315" ref-type="bibr">24</xref>); however, in many cases, the ‘elbow’ may not be easily defined. In both cases, the remainder of the PCs are discarded, thus efficiently reducing the dataset dimensions (<xref rid="b25-WASJ-7-2-00315" ref-type="bibr">25</xref>).</p>
<p>When cells in scRNA-Seq data are treated as data points, PCs are linear combinations of genes, known as latent genes (<xref rid="b26-WASJ-7-2-00315" ref-type="bibr">26</xref>). As scRNA-Seq data provide no prior information about the identity of each cell, PCA, as an unsupervised method, may capture the linear associations present in the scRNA-Seq gene expressions, producing a low-dimension dataset, having an equal amount of cells as originally studied, and a smaller number of latent genes than in the original dataset, while retaining most of its variance (<xref rid="b20-WASJ-7-2-00315" ref-type="bibr">20</xref>). The produced low-dimensional gene expression matrix is commonly used as input to visualisation algorithms or for additional analyses.</p>
</sec>
<sec>
<title>Visualisation methods in lower dimensions</title>
<p>To visualise high-dimensional data in a comprehensible form, data first need to undergo dimensionality reduction and then, to be mapped into two dimensions if a plot is drawn (<xref rid="b20-WASJ-7-2-00315" ref-type="bibr">20</xref>). Alternatively, if 3D-visualisation software is used, data need to be mapped into three dimensions. For scRNA-Seq data, there are two major methods for dimensionality reduction into two or three dimensions, and subsequent visualisation: t-distributed stochastic neighbour embedding (t-SNE) and uniform manifold approximation and projection (UMAP). t-SNE (<xref rid="b14-WASJ-7-2-00315" ref-type="bibr">14</xref>) was created as an improvement to the SNE method (<xref rid="b27-WASJ-7-2-00315" ref-type="bibr">27</xref>), which uses a Gaussian distribution to determine the similarity of the low-dimensional points and determines the low-dimensional representation through a loss function. t-SNE uses a Student-t distribution and an improved loss function, ultimately offering better spread of the data points and faster run time, respectively. UMAP (<xref rid="b28-WASJ-7-2-00315" ref-type="bibr">28</xref>) constructs a k-neighbour weighted graph and subsequently computes a lower-dimension layout of it. UMAP is more recent and was developed as an alternative to t-SNE, having an even lower execution time, while claiming to preserve the global structure of the data; i.e., the overall arrangement of the clusters, better.</p>
<p>t-SNE and UMAP, as non-linear methods, are commonly used in scRNA-Seq analysis pipelines to perform visualisation of the cells, being able to capture the non-linear relationships of the data. Cells (as data points) with similar expression patterns are grouped closer to each other in the three-dimensional space. Subsequently, by colour-coding each cell using given annotations, e.g., cell-type, tissue, etc., it is possible to define novel cell sub-populations with distinct expression patterns, through visual exploration (<xref rid="b29-WASJ-7-2-00315" ref-type="bibr">29</xref>). In a similar manner, genes may also be visualised. In this case, the users are able to discover groups of co-expressed genes with similar expression patterns (<xref rid="b30-WASJ-7-2-00315" ref-type="bibr">30</xref>), although thorough gene annotations are necessary to define the biologically-connected gene clusters.</p>
<p>Both t-SNE and UMAP are able to preserve the global, as well as the local structure of data, using proper data initialisation, PCA being one of the options for this step (<xref rid="b31-WASJ-7-2-00315" ref-type="bibr">31</xref>), while also having similar execution times with parameter tuning. Thus, it is recommended to perform a different dimensionality reduction approach as a pre-processing step prior to trying out both methods, when visualising scRNA-Seq data, and determining which plot better depicts the organisation of the cell clusters.</p>
<p>PCA, t-SNE and UMAP are already established techniques and integral parts in the pre-processing and visualisation of scRNA-Seq data (<xref rid="b29-WASJ-7-2-00315" ref-type="bibr">29</xref>) and are also included in major processing pipelines and software, such as SEURAT (<xref rid="b32-WASJ-7-2-00315" ref-type="bibr">32</xref>) and Cell Ranger (<xref rid="b10-WASJ-7-2-00315" ref-type="bibr">10</xref>). Thus, these methods are used in the majority of scientific studies that include scRNA-Seq data analysis. Nevertheless, the increasing diversity and dimensionality of scRNA-Seq data necessitated the usage of more advanced techniques for their efficient analysis.</p>
</sec>
</sec>
</sec>
<sec>
<title>4. Deep learning-based dimensionality reduction methods</title>
<sec>
<title/>
<sec>
<title>Autoencoders</title>
<p>The advancement of neural networks using multiple hidden layers, coupled with increased computing power, has led to the evolution of machine learning to deep learning (<xref rid="b33-WASJ-7-2-00315" ref-type="bibr">33</xref>). The ability of deep learning-based methods to be trained and learn the distribution of the input data was proven valuable for the construction of tools that deal with the high-dimensionality and sparsity of scRNA-Seq data. One such tool is scvis (<xref rid="b34-WASJ-7-2-00315" ref-type="bibr">34</xref>), an autoencoder-based method for the dimensionality reduction and subsequent visualisation of scRNA-Seq data. Autoencoders are an archetypal deep-learning technique consisting of two neural networks with hidden layers: One encoder network and one decoder network. Autoencoders are trained to learn compressed representations of input data (<xref rid="b35-WASJ-7-2-00315" ref-type="bibr">35</xref>). At first glance, scvis is similar in functionality to t-SNE and UMAP, as it is mainly used for the visualisation of cells and detection of new cell subtypes. However, scvis can detect both linear and non-linear associations in the data and has been shown to possess improved performance, achieving similar or better grouping of data points, while also scaling better with larger datasets (<xref rid="b34-WASJ-7-2-00315" ref-type="bibr">34</xref>). Nevertheless, data initialisation is equally necessary in the case of scvis, to preserve both global and local alignment of the original data.</p>
<p>Another autoencoder-based technique is deep count autoencoder (DCA) (<xref rid="b36-WASJ-7-2-00315" ref-type="bibr">36</xref>). As opposed to scvis, DCA is used for the denoising of scRNA-Seq data, which refers to the efficient imputation of data, while also aiming to improve the expression estimation of all gene counts (<xref rid="b37-WASJ-7-2-00315" ref-type="bibr">37</xref>). DCA exhibits better performance compared to commonly used imputation techniques, such as SAVER (<xref rid="b38-WASJ-7-2-00315" ref-type="bibr">38</xref>) and scImpute (<xref rid="b39-WASJ-7-2-00315" ref-type="bibr">39</xref>), showcasing the application of autoencoders for performing simultaneous dimensionality reduction and imputation. The rapid advancement of deep learning has enabled further improvements in neural networks, in the form of variational autoencoders (VAEs) and generative adversarial networks (GANs), which have skyrocketed in popularity.</p>
</sec>
<sec>
<title>VAEs</title>
<p>VAEs (<xref rid="b40-WASJ-7-2-00315" ref-type="bibr">40</xref>) represent a paradigm shift in the field of deep learning, particularly in their application to complex, high-dimensional datasets. At their core, VAEs are an advancement of traditional autoencoders (<xref rid="b35-WASJ-7-2-00315" ref-type="bibr">35</xref>), although VAEs diverge significantly by incorporating a probabilistic framework. This framework involves the encoder network mapping input data not to a deterministic point, but to a probability distribution within a latent space. Consequently, the decoder network reconstructs the input data by sampling from this latent distribution. This probabilistic approach is underpinned by the principles of variational inference, enabling the approximation of complex data distributions. The incorporation of stochasticity in the encoding process allows VAEs to generate new data samples by sampling from the learned latent space distribution.</p>
<p>VAEs have been proven as an effective tool for reducing scRNA-Seq data dimensionality, while retaining the biological properties of the original dataset (<xref rid="b41-WASJ-7-2-00315" ref-type="bibr">41</xref>). VAEs not only compress gene expression data into a more manageable latent space, considering that such datasets can contain data of &gt;100,000 cells, but they also capture the biological variance across cells, while mitigating the impact of the inherent noise and sparsity of scRNA-Seq data (<xref rid="b42-WASJ-7-2-00315" ref-type="bibr">42</xref>). The sampling of the probabilistic latent space in VAEs yields different datasets each time, yet properly trained models tend to produce results that exhibit minimal variance among them (<xref rid="b40-WASJ-7-2-00315" ref-type="bibr">40</xref>). Furthermore, utilising non-linear transformations for producing a low-dimensional latent space through the training on non-linear mappings of high-dimensional data could improve data clustering (<xref rid="b43-WASJ-7-2-00315" ref-type="bibr">43</xref>). Thus, the low-dimensional gene expression generated by trained VAEs, can facilitate downstream analyses (<xref rid="b44-WASJ-7-2-00315" ref-type="bibr">44</xref>), such as cell clustering, gene co-expression or regulatory network inference or protein-protein association network construction. Such applications of VAEs have been developed, including DiffVAE (<xref rid="b45-WASJ-7-2-00315" ref-type="bibr">45</xref>) for modelling cell differentiation, BEENE (<xref rid="b46-WASJ-7-2-00315" ref-type="bibr">46</xref>) for improved batch correction, β-TCVAE (<xref rid="b47-WASJ-7-2-00315" ref-type="bibr">47</xref>), which was used for data integration in single-cell GTEx (<xref rid="b48-WASJ-7-2-00315" ref-type="bibr">48</xref>) and FAVA (<xref rid="b49-WASJ-7-2-00315" ref-type="bibr">49</xref>) for the inference of high-quality protein-protein association networks. The newest version of STRING, used FAVA for the computation of the co-expression scores, as the results of this method outperformed their previous ones, since they were able to capture both linear and non-linear associations of the scRNA-Seq data (<xref rid="b50-WASJ-7-2-00315" ref-type="bibr">50</xref>).</p>
</sec>
<sec>
<title>GANs</title>
<p>GANs (<xref rid="b51-WASJ-7-2-00315" ref-type="bibr">51</xref>) are a class of deep learning algorithms that have garnered significant attention for their ability to generate high-quality, synthetic data samples. A GAN consists of two neural networks, the generator and the discriminator, engaged in a continuous adversarial process. The generator attempts to produce data samples indistinguishable from real data, while the discriminator strives to differentiate between the generator's synthetic data and actual data. This adversarial training encourages the generator to produce increasingly realistic samples, adjusting its parameters to produce data that better model the complex distribution of the input data.</p>
<p>In the context of the analysis of scRNA-Seq data, GANs are particularly valuable as they can learn to capture and reproduce the intricate structures and patterns inherent in such data. Instead of performing dimensionality reduction in a direct way, i.e., by performing feature extraction on the genes or samples, GANs generate new datasets of a desired number of dimensions, thus indirectly reducing the dimensionality of the original dataset. GANs can be employed to learn the complex distribution of scRNA-Seq data, and once trained, GANs may generate synthetic, yet biologically plausible, single-cell gene expression profiles (<xref rid="b52-WASJ-7-2-00315" ref-type="bibr">52</xref>). These ‘fabricated’ datasets can be used to augment the original dataset as input to other algorithms, in the cases where data scarcity prevents the easy procurement of training datasets and be utilised in place of a high-dimensional dataset, while providing a similar amount of biological information or by imitating data derived from specific biological conditions (<xref rid="b53-WASJ-7-2-00315" ref-type="bibr">53</xref>). GANs outperform the usual methods for synthetic scRNA-Seq dataset generation, when their output is used to construct gene regulatory networks, as GANs can more efficiently generate realistic datasets and thus allowing downstream network creation algorithms that perform well on synthetic datasets to generalise well on real data (<xref rid="b54-WASJ-7-2-00315" ref-type="bibr">54</xref>). Applications of GANs in scRNA-Seq data include cscGAN (<xref rid="b55-WASJ-7-2-00315" ref-type="bibr">55</xref>) and LSH-GAN (<xref rid="b56-WASJ-7-2-00315" ref-type="bibr">56</xref>), used for dataset generation. Certain methods, such as AGImpute (<xref rid="b57-WASJ-7-2-00315" ref-type="bibr">57</xref>), combine both autoencoders and GANs in their approach, in this case, to perform cell-type aware imputation of scRNA-Seq data.</p>
</sec>
</sec>
</sec>
<sec>
<title>5. Comparison between dimensionality reduction methods</title>
<p>Even though a variety of options for dimensionality reduction of scRNA-Seq data were described, each one has specific use-cases, as well as certain advantages and disadvantages (<xref rid="tI-WASJ-7-2-00315" ref-type="table">Table I</xref>).</p>
<p>Dimensionality reduction techniques such as PCA, t-SNE and UMAP have been established in the scientific community, being integral scRNA-Seq analysis steps, thanks to their fast execution times owing to their comparatively low need for computational resources, particularly in the case of PCA. However, in recent years, their application has been limited to either being used for data pre-processing (PCA) or visualisation of cells (t-SNA and UMAP). Furthermore, t-SNE and UMAP have been shown to require a lot of computational resources with larger input data, while also requiring data initialisation and proper parameter tuning to produce similar plots (<xref rid="b31-WASJ-7-2-00315" ref-type="bibr">31</xref>,<xref rid="b58-WASJ-7-2-00315" ref-type="bibr">58</xref>).</p>
<p>In comparison, deep learning-based dimensionality reduction techniques have recently been in the centre of attention, owing mostly to their ability to be trained on the input dataset, made more accessible through the development of deep learning packages such as Keras (<xref rid="b59-WASJ-7-2-00315" ref-type="bibr">59</xref>) and TensorFlow (<xref rid="b60-WASJ-7-2-00315" ref-type="bibr">60</xref>). Deep-learning techniques are valued for their ability to capture both linear and non-linear relationships of the input data, compared to PCA, which is a linear method, and t-SNE/UMAP which are non-linear methods. However, deep-learning methods require much more computational resources than their statistical or machine learning counterparts, often relying on multiple graphical processing units for optimal execution (<xref rid="b61-WASJ-7-2-00315" ref-type="bibr">61</xref>), which renders them less friendly to the average user. Furthermore, advanced knowledge of deep-learning is necessary for the construction of optimised VAEs and GANs, including the integration of the best training and validation sets. If these networks are not trained properly, e.g., having a small validation dataset or unbalanced input data, they may overfit and, thus, not produce impartial data (<xref rid="b62-WASJ-7-2-00315" ref-type="bibr">62</xref>). Thus, ample research and evaluation are still necessary by the scientific community to integrate these techniques into popular data analysis pipelines.</p>
</sec>
<sec>
<title>6. Conclusions and future perspectives</title>
<p>The advent of scRNA-Seq has enabled the study of gene expression with unprecedented definition and cell-specificity. However, the high-sparsity and high-dimensionality of scRNA-Seq data requires the use of strategies in order to bring them to a comprehensible state and extract meaningful biological information. Dimensionality reduction techniques, such as PCA, t-SNE and UMAP help in the visualisation of such data, being indispensable tools in their visual examination. More advanced techniques such as VAEs and GANs bring the data to lower dimensions, while retaining the original biological information. This facilitates their usage for downstream analyses, e.g. identification of co-expressed genes or cell subtypes, as well as their role in creating synthetic scRNA-Seq datasets, to be used for the training or evaluation of more complex algorithms. The overall volume of scRNA-Seq datasets, in conjunction with readily available software packages which implement such methods, has allowed for the massive influx of research articles based on scRNA-Seq analyses, in the recent years. The future advancement of deep learning will further improve the speed and fidelity of the analyses based on dimensionality reduction.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>The authors are indebted to Professor Nikolaos Drakoulis (Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece) for inviting them to present this work at the 4th International Congress on Pharmacogenomics and Personalized Diagnosis and Therapy.</p>
</ack>
<sec sec-type="data-availability">
<title>Availability of data and materials</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>VLZ performed literature review, wrote the original draft of the manuscript, and wrote, reviewed and edited the manuscript. IT, DAS and VAI wrote, reviewed and edited the manuscript. IM conceptualized and supervised the study, was involved in the writing of the original draft of the manuscript, and also wrote, reviewed and edited the final manuscript. All authors have read and approved the final version of the manuscript. Data authentication is not applicable.</p>
</sec>
<sec>
<title>Ethics approval and consent to participate</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Patient consent for publication</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Competing interests</title>
<p>DAS is the Managing Editor of the journal, but had no personal involvement in the reviewing process, or any influence in terms of adjudicating on the final decision, for this article.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="b1-WASJ-7-2-00315"><label>1</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>Z</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name><name><surname>Snyder</surname><given-names>M</given-names></name></person-group><article-title>RNA-Seq: A revolutionary tool for transcriptomics</article-title><source>Nat Rev Genet</source><volume>10</volume><fpage>57</fpage><lpage>63</lpage><year>2009</year><pub-id pub-id-type="pmid">19015660</pub-id><pub-id pub-id-type="doi">10.1038/nrg2484</pub-id></element-citation></ref>
<ref id="b2-WASJ-7-2-00315"><label>2</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schena</surname><given-names>M</given-names></name><name><surname>Shalon</surname><given-names>D</given-names></name><name><surname>Davis</surname><given-names>RW</given-names></name><name><surname>Brown</surname><given-names>PO</given-names></name></person-group><article-title>Quantitative monitoring of gene expression patterns with a complementary DNA microarray</article-title><source>Science</source><volume>270</volume><fpage>467</fpage><lpage>470</lpage><year>1995</year><pub-id pub-id-type="pmid">7569999</pub-id><pub-id pub-id-type="doi">10.1126/science.270.5235.467</pub-id></element-citation></ref>
<ref id="b3-WASJ-7-2-00315"><label>3</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname><given-names>F</given-names></name><name><surname>Barbacioru</surname><given-names>C</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name><name><surname>Nordman</surname><given-names>E</given-names></name><name><surname>Lee</surname><given-names>C</given-names></name><name><surname>Xu</surname><given-names>N</given-names></name><name><surname>Wang</surname><given-names>X</given-names></name><name><surname>Bodeau</surname><given-names>J</given-names></name><name><surname>Tuch</surname><given-names>BB</given-names></name><name><surname>Siddiqui</surname><given-names>A</given-names></name><etal/></person-group><article-title>mRNA-Seq whole-transcriptome analysis of a single cell</article-title><source>Nat Methods</source><volume>6</volume><fpage>377</fpage><lpage>382</lpage><year>2009</year><pub-id pub-id-type="pmid">19349980</pub-id><pub-id pub-id-type="doi">10.1038/nmeth.1315</pub-id></element-citation></ref>
<ref id="b4-WASJ-7-2-00315"><label>4</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Haque</surname><given-names>A</given-names></name><name><surname>Engel</surname><given-names>J</given-names></name><name><surname>Teichmann</surname><given-names>SA</given-names></name><name><surname>Lonnberg</surname><given-names>T</given-names></name></person-group><article-title>A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications</article-title><source>Genome Med</source><volume>9</volume><issue>75</issue><year>2017</year><pub-id pub-id-type="pmid">28821273</pub-id><pub-id pub-id-type="doi">10.1186/s13073-017-0467-4</pub-id></element-citation></ref>
<ref id="b5-WASJ-7-2-00315"><label>5</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cock</surname><given-names>PJ</given-names></name><name><surname>Fields</surname><given-names>CJ</given-names></name><name><surname>Goto</surname><given-names>N</given-names></name><name><surname>Heuer</surname><given-names>ML</given-names></name><name><surname>Rice</surname><given-names>PM</given-names></name></person-group><article-title>The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants</article-title><source>Nucleic Acids Res</source><volume>38</volume><fpage>1767</fpage><lpage>1771</lpage><year>2010</year><pub-id pub-id-type="pmid">20015970</pub-id><pub-id pub-id-type="doi">10.1093/nar/gkp1137</pub-id></element-citation></ref>
<ref id="b6-WASJ-7-2-00315"><label>6</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kivioja</surname><given-names>T</given-names></name><name><surname>Vaharautio</surname><given-names>A</given-names></name><name><surname>Karlsson</surname><given-names>K</given-names></name><name><surname>Bonke</surname><given-names>M</given-names></name><name><surname>Enge</surname><given-names>M</given-names></name><name><surname>Linnarsson</surname><given-names>S</given-names></name><name><surname>Taipale</surname><given-names>J</given-names></name></person-group><article-title>Counting absolute numbers of molecules using unique molecular identifiers</article-title><source>Nat Methods</source><volume>9</volume><fpage>72</fpage><lpage>74</lpage><year>2011</year><pub-id pub-id-type="pmid">22101854</pub-id><pub-id pub-id-type="doi">10.1038/nmeth.1778</pub-id></element-citation></ref>
<ref id="b7-WASJ-7-2-00315"><label>7</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Satija</surname><given-names>R</given-names></name><name><surname>Farrell</surname><given-names>JA</given-names></name><name><surname>Gennert</surname><given-names>D</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name></person-group><article-title>Spatial reconstruction of single-cell gene expression data</article-title><source>Nat Biotechnol</source><volume>33</volume><fpage>495</fpage><lpage>502</lpage><year>2015</year><pub-id pub-id-type="pmid">25867923</pub-id><pub-id pub-id-type="doi">10.1038/nbt.3192</pub-id></element-citation></ref>
<ref id="b8-WASJ-7-2-00315"><label>8</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zogopoulos</surname><given-names>VL</given-names></name><name><surname>Saxami</surname><given-names>G</given-names></name><name><surname>Malatras</surname><given-names>A</given-names></name><name><surname>Papadopoulos</surname><given-names>K</given-names></name><name><surname>Tsotra</surname><given-names>I</given-names></name><name><surname>Iconomidou</surname><given-names>VA</given-names></name><name><surname>Michalopoulos</surname><given-names>I</given-names></name></person-group><article-title>Approaches in gene coexpression analysis in eukaryotes</article-title><source>Biology (Basel)</source><volume>11</volume><issue>1019</issue><year>2022</year><pub-id pub-id-type="pmid">36101400</pub-id><pub-id pub-id-type="doi">10.3390/biology11071019</pub-id></element-citation></ref>
<ref id="b9-WASJ-7-2-00315"><label>9</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ilicic</surname><given-names>T</given-names></name><name><surname>Kim</surname><given-names>JK</given-names></name><name><surname>Kolodziejczyk</surname><given-names>AA</given-names></name><name><surname>Bagger</surname><given-names>FO</given-names></name><name><surname>McCarthy</surname><given-names>DJ</given-names></name><name><surname>Marioni</surname><given-names>JC</given-names></name><name><surname>Teichmann</surname><given-names>SA</given-names></name></person-group><article-title>Classification of low quality cells from single-cell RNA-seq data</article-title><source>Genome Biol</source><volume>17</volume><issue>29</issue><year>2016</year><pub-id pub-id-type="pmid">26887813</pub-id><pub-id pub-id-type="doi">10.1186/s13059-016-0888-1</pub-id></element-citation></ref>
<ref id="b10-WASJ-7-2-00315"><label>10</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zheng</surname><given-names>GX</given-names></name><name><surname>Terry</surname><given-names>JM</given-names></name><name><surname>Belgrader</surname><given-names>P</given-names></name><name><surname>Ryvkin</surname><given-names>P</given-names></name><name><surname>Bent</surname><given-names>ZW</given-names></name><name><surname>Wilson</surname><given-names>R</given-names></name><name><surname>Ziraldo</surname><given-names>SB</given-names></name><name><surname>Wheeler</surname><given-names>TD</given-names></name><name><surname>McDermott</surname><given-names>GP</given-names></name><name><surname>Zhu</surname><given-names>J</given-names></name><etal/></person-group><article-title>Massively parallel digital transcriptional profiling of single cells</article-title><source>Nat Commun</source><volume>8</volume><issue>14049</issue><year>2017</year><pub-id pub-id-type="pmid">28091601</pub-id><pub-id pub-id-type="doi">10.1038/ncomms14049</pub-id></element-citation></ref>
<ref id="b11-WASJ-7-2-00315"><label>11</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname><given-names>Y</given-names></name><name><surname>Zhang</surname><given-names>K</given-names></name></person-group><article-title>Tools for the analysis of high-dimensional single-cell RNA sequencing data</article-title><source>Nat Rev Nephrol</source><volume>16</volume><fpage>408</fpage><lpage>421</lpage><year>2020</year><pub-id pub-id-type="pmid">32221477</pub-id><pub-id pub-id-type="doi">10.1038/s41581-020-0262-0</pub-id></element-citation></ref>
<ref id="b12-WASJ-7-2-00315"><label>12</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Qiu</surname><given-names>P</given-names></name></person-group><article-title>Embracing the dropouts in single-cell RNA-seq analysis</article-title><source>Nat Commun</source><volume>11</volume><issue>1169</issue><year>2020</year><pub-id pub-id-type="pmid">32127540</pub-id><pub-id pub-id-type="doi">10.1038/s41467-020-14976-9</pub-id></element-citation></ref>
<ref id="b13-WASJ-7-2-00315"><label>13</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Imoto</surname><given-names>Y</given-names></name><name><surname>Nakamura</surname><given-names>T</given-names></name><name><surname>Escolar</surname><given-names>EG</given-names></name><name><surname>Yoshiwaki</surname><given-names>M</given-names></name><name><surname>Kojima</surname><given-names>Y</given-names></name><name><surname>Yabuta</surname><given-names>Y</given-names></name><name><surname>Katou</surname><given-names>Y</given-names></name><name><surname>Yamamoto</surname><given-names>T</given-names></name><name><surname>Hiraoka</surname><given-names>Y</given-names></name><name><surname>Saitou</surname><given-names>M</given-names></name></person-group><article-title>Resolution of the curse of dimensionality in single-cell RNA sequencing data analysis</article-title><source>Life Sci Alliance</source><volume>5</volume><issue>e202201591</issue><year>2022</year><pub-id pub-id-type="pmid">35944930</pub-id><pub-id pub-id-type="doi">10.26508/lsa.202201591</pub-id></element-citation></ref>
<ref id="b14-WASJ-7-2-00315"><label>14</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Van der Maaten</surname><given-names>L</given-names></name><name><surname>Hinton</surname><given-names>G</given-names></name></person-group><article-title>Visualizing data using t-SNE</article-title><source>J Mach Learn Res</source><volume>9</volume><year>2008</year></element-citation></ref>
<ref id="b15-WASJ-7-2-00315"><label>15</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nanga</surname><given-names>S</given-names></name><name><surname>Bawah</surname><given-names>AT</given-names></name><name><surname>Acquaye</surname><given-names>BA</given-names></name><name><surname>Billa</surname><given-names>MI</given-names></name><name><surname>Baeta</surname><given-names>FD</given-names></name><name><surname>Odai</surname><given-names>NA</given-names></name><name><surname>Obeng</surname><given-names>SK</given-names></name><name><surname>Nsiah</surname><given-names>AD</given-names></name></person-group><article-title>Review of dimension reduction methods</article-title><source>J Data Anal Inform Process</source><volume>09</volume><fpage>189</fpage><lpage>231</lpage><year>2021</year></element-citation></ref>
<ref id="b16-WASJ-7-2-00315"><label>16</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sarker</surname><given-names>IH</given-names></name></person-group><article-title>Machine learning: Algorithms, Real-world applications and research directions</article-title><source>SN Comput Sci</source><volume>2</volume><issue>160</issue><year>2021</year><pub-id pub-id-type="pmid">33778771</pub-id><pub-id pub-id-type="doi">10.1007/s42979-021-00592-x</pub-id></element-citation></ref>
<ref id="b17-WASJ-7-2-00315"><label>17</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alpaydin</surname><given-names>E</given-names></name></person-group><comment>Introduction to Machine Learning. MIT Press, Cambridge, Massachusetts, London, England, 2020.</comment></element-citation></ref>
<ref id="b18-WASJ-7-2-00315"><label>18</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Okada</surname><given-names>H</given-names></name><name><surname>Chung</surname><given-names>UI</given-names></name><name><surname>Hojo</surname><given-names>H</given-names></name></person-group><article-title>Practical compass of Single-cell RNA-Seq Analysis</article-title><source>Curr Osteoporos Rep</source><volume>22</volume><fpage>433</fpage><lpage>440</lpage><year>2024</year><pub-id pub-id-type="pmid">38019344</pub-id><pub-id pub-id-type="doi">10.1007/s11914-023-00840-4</pub-id></element-citation></ref>
<ref id="b19-WASJ-7-2-00315"><label>19</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Arora</surname><given-names>JK</given-names></name><name><surname>Opasawatchai</surname><given-names>A</given-names></name><name><surname>Poonpanichakul</surname><given-names>T</given-names></name><name><surname>Jiravejchakul</surname><given-names>N</given-names></name><name><surname>Sungnak</surname><given-names>W</given-names></name><name><surname>Thailand</surname><given-names>D</given-names></name><name><surname>Matangkasombut</surname><given-names>O</given-names></name><name><surname>Teichmann</surname><given-names>SA</given-names></name><name><surname>Matangkasombut</surname><given-names>P</given-names></name><name><surname>Charoensawan</surname><given-names>V</given-names></name></person-group><article-title>Single-cell temporal analysis of natural dengue infection reveals skin-homing lymphocyte expansion one day before defervescence</article-title><source>iScience</source><volume>25</volume><issue>104034</issue><year>2022</year><pub-id pub-id-type="pmid">35345453</pub-id><pub-id pub-id-type="doi">10.1016/j.isci.2022.104034</pub-id></element-citation></ref>
<ref id="b20-WASJ-7-2-00315"><label>20</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Linderman</surname><given-names>GC</given-names></name></person-group><article-title>Dimensionality reduction of Single-cell RNA-Seq data</article-title><source>Methods Mol Biol</source><volume>2284</volume><fpage>331</fpage><lpage>342</lpage><year>2021</year><pub-id pub-id-type="pmid">33835451</pub-id><pub-id pub-id-type="doi">10.1007/978-1-0716-1307-8_18</pub-id></element-citation></ref>
<ref id="b21-WASJ-7-2-00315"><label>21</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pearson</surname><given-names>K</given-names></name></person-group><article-title>LIII. On lines and planes of closest fit to systems of points in space</article-title><source>Lond Edinb Dubl Phil Mag</source><volume>2</volume><fpage>559</fpage><lpage>572</lpage><year>1901</year></element-citation></ref>
<ref id="b22-WASJ-7-2-00315"><label>22</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jolliffe</surname><given-names>IT</given-names></name></person-group><comment>Principal Component Analysis. Springer, New York, NY, 2002.</comment></element-citation></ref>
<ref id="b23-WASJ-7-2-00315"><label>23</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jolliffe</surname><given-names>IT</given-names></name><name><surname>Cadima</surname><given-names>J</given-names></name></person-group><article-title>Principal component analysis: A review and recent developments</article-title><source>Philos Trans A Math Phys Eng Sci</source><volume>374</volume><issue>20150202</issue><year>2016</year><pub-id pub-id-type="pmid">26953178</pub-id><pub-id pub-id-type="doi">10.1098/rsta.2015.0202</pub-id></element-citation></ref>
<ref id="b24-WASJ-7-2-00315"><label>24</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Thorndike</surname><given-names>RL</given-names></name></person-group><article-title>Who belongs in the family?</article-title><source>Psychometrika</source><volume>18</volume><fpage>267</fpage><lpage>276</lpage><year>1953</year></element-citation></ref>
<ref id="b25-WASJ-7-2-00315"><label>25</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tsuyuzaki</surname><given-names>K</given-names></name><name><surname>Sato</surname><given-names>H</given-names></name><name><surname>Sato</surname><given-names>K</given-names></name><name><surname>Nikaido</surname><given-names>I</given-names></name></person-group><article-title>Benchmarking principal component analysis for large-scale single-cell RNA-sequencing</article-title><source>Genome Biol</source><volume>21</volume><issue>9</issue><year>2020</year><pub-id pub-id-type="pmid">31955711</pub-id><pub-id pub-id-type="doi">10.1186/s13059-019-1900-3</pub-id></element-citation></ref>
<ref id="b26-WASJ-7-2-00315"><label>26</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname><given-names>S</given-names></name><name><surname>Dai</surname><given-names>Y</given-names></name></person-group><article-title>Principal component analysis based methods in bioinformatics studies</article-title><source>Brief Bioinform</source><volume>12</volume><fpage>714</fpage><lpage>722</lpage><year>2011</year><pub-id pub-id-type="pmid">21242203</pub-id><pub-id pub-id-type="doi">10.1093/bib/bbq090</pub-id></element-citation></ref>
<ref id="b27-WASJ-7-2-00315"><label>27</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname><given-names>GE</given-names></name><name><surname>Roweis</surname><given-names>S</given-names></name></person-group><comment>Stochastic Neighbor Embedding. In: Advances in Neural Information Processing Systems. Becker S, Thrun S and Obermayer K (eds.) MIT Press, Cambridge, MA, pp857-864, 2003.</comment></element-citation></ref>
<ref id="b28-WASJ-7-2-00315"><label>28</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McInnes</surname><given-names>L</given-names></name><name><surname>Healy</surname><given-names>J</given-names></name><name><surname>Melville</surname><given-names>J</given-names></name></person-group><comment>Umap: Uniform manifold approximation and projection for dimension reduction arXiv: 1802.03426, 2018.</comment></element-citation></ref>
<ref id="b29-WASJ-7-2-00315"><label>29</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Slovin</surname><given-names>S</given-names></name><name><surname>Carissimo</surname><given-names>A</given-names></name><name><surname>Panariello</surname><given-names>F</given-names></name><name><surname>Grimaldi</surname><given-names>A</given-names></name><name><surname>Bouche</surname><given-names>V</given-names></name><name><surname>Gambardella</surname><given-names>G</given-names></name><name><surname>Cacchiarelli</surname><given-names>D</given-names></name></person-group><article-title>Single-cell RNA sequencing analysis: A Step-by-Step overview</article-title><source>Methods Mol Biol</source><volume>2284</volume><fpage>343</fpage><lpage>365</lpage><year>2021</year><pub-id pub-id-type="pmid">33835452</pub-id><pub-id pub-id-type="doi">10.1007/978-1-0716-1307-8_19</pub-id></element-citation></ref>
<ref id="b30-WASJ-7-2-00315"><label>30</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lachmann</surname><given-names>A</given-names></name><name><surname>Torre</surname><given-names>D</given-names></name><name><surname>Keenan</surname><given-names>AB</given-names></name><name><surname>Jagodnik</surname><given-names>KM</given-names></name><name><surname>Lee</surname><given-names>HJ</given-names></name><name><surname>Wang</surname><given-names>L</given-names></name><name><surname>Silverstein</surname><given-names>MC</given-names></name><name><surname>Ma'ayan</surname><given-names>A</given-names></name></person-group><article-title>Massive mining of publicly available RNA-seq data from human and mouse</article-title><source>Nat Commun</source><volume>9</volume><issue>1366</issue><year>2018</year><pub-id pub-id-type="pmid">29636450</pub-id><pub-id pub-id-type="doi">10.1038/s41467-018-03751-6</pub-id></element-citation></ref>
<ref id="b31-WASJ-7-2-00315"><label>31</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kobak</surname><given-names>D</given-names></name><name><surname>Linderman</surname><given-names>GC</given-names></name></person-group><article-title>Initialization is critical for preserving global data structure in both t-SNE and UMAP</article-title><source>Nat Biotechnol</source><volume>39</volume><fpage>156</fpage><lpage>157</lpage><year>2021</year><pub-id pub-id-type="pmid">33526945</pub-id><pub-id pub-id-type="doi">10.1038/s41587-020-00809-z</pub-id></element-citation></ref>
<ref id="b32-WASJ-7-2-00315"><label>32</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hao</surname><given-names>Y</given-names></name><name><surname>Stuart</surname><given-names>T</given-names></name><name><surname>Kowalski</surname><given-names>MH</given-names></name><name><surname>Choudhary</surname><given-names>S</given-names></name><name><surname>Hoffman</surname><given-names>P</given-names></name><name><surname>Hartman</surname><given-names>A</given-names></name><name><surname>Srivastava</surname><given-names>A</given-names></name><name><surname>Molla</surname><given-names>G</given-names></name><name><surname>Madad</surname><given-names>S</given-names></name><name><surname>Fernandez-Granda</surname><given-names>C</given-names></name><name><surname>Satija</surname><given-names>R</given-names></name></person-group><article-title>Dictionary learning for integrative, multimodal and scalable single-cell analysis</article-title><source>Nat Biotechnol</source><volume>42</volume><fpage>293</fpage><lpage>304</lpage><year>2024</year><pub-id pub-id-type="pmid">37231261</pub-id><pub-id pub-id-type="doi">10.1038/s41587-023-01767-y</pub-id></element-citation></ref>
<ref id="b33-WASJ-7-2-00315"><label>33</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goodfellow</surname><given-names>I</given-names></name><name><surname>Bengio</surname><given-names>Y</given-names></name><name><surname>Courville</surname><given-names>A</given-names></name></person-group><comment>Deep Learning. An MIT Press book. <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://www.deeplearningbook.org/">https://www.deeplearningbook.org/</ext-link>.</comment></element-citation></ref>
<ref id="b34-WASJ-7-2-00315"><label>34</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ding</surname><given-names>J</given-names></name><name><surname>Condon</surname><given-names>A</given-names></name><name><surname>Shah</surname><given-names>SP</given-names></name></person-group><article-title>Interpretable dimensionality reduction of single cell transcriptome data with deep generative models</article-title><source>Nat Commun</source><volume>9</volume><issue>2002</issue><year>2018</year><pub-id pub-id-type="pmid">29784946</pub-id><pub-id pub-id-type="doi">10.1038/s41467-018-04368-5</pub-id></element-citation></ref>
<ref id="b35-WASJ-7-2-00315"><label>35</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kramer</surname><given-names>MA</given-names></name></person-group><article-title>Nonlinear principal component analysis using autoassociative neural networks</article-title><source>AIChE J</source><volume>37</volume><fpage>233</fpage><lpage>243</lpage><year>1991</year></element-citation></ref>
<ref id="b36-WASJ-7-2-00315"><label>36</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eraslan</surname><given-names>G</given-names></name><name><surname>Simon</surname><given-names>LM</given-names></name><name><surname>Mircea</surname><given-names>M</given-names></name><name><surname>Mueller</surname><given-names>NS</given-names></name><name><surname>Theis</surname><given-names>FJ</given-names></name></person-group><article-title>Single-cell RNA-seq denoising using a deep count autoencoder</article-title><source>Nat Commun</source><volume>10</volume><issue>390</issue><year>2019</year><pub-id pub-id-type="pmid">30674886</pub-id><pub-id pub-id-type="doi">10.1038/s41467-018-07931-2</pub-id></element-citation></ref>
<ref id="b37-WASJ-7-2-00315"><label>37</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Agarwal</surname><given-names>D</given-names></name><name><surname>Wang</surname><given-names>J</given-names></name><name><surname>Zhang</surname><given-names>NR</given-names></name></person-group><article-title>Data denoising and Post-denoising corrections in single cell RNA sequencing</article-title><source>Statistical Science</source><volume>35</volume><fpage>112</fpage><lpage>128</lpage><year>2020</year></element-citation></ref>
<ref id="b38-WASJ-7-2-00315"><label>38</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>M</given-names></name><name><surname>Wang</surname><given-names>J</given-names></name><name><surname>Torre</surname><given-names>E</given-names></name><name><surname>Dueck</surname><given-names>H</given-names></name><name><surname>Shaffer</surname><given-names>S</given-names></name><name><surname>Bonasio</surname><given-names>R</given-names></name><name><surname>Murray</surname><given-names>JI</given-names></name><name><surname>Raj</surname><given-names>A</given-names></name><name><surname>Li</surname><given-names>M</given-names></name><name><surname>Zhang</surname><given-names>NR</given-names></name></person-group><article-title>SAVER: Gene expression recovery for single-cell RNA sequencing</article-title><source>Nat Methods</source><volume>15</volume><fpage>539</fpage><lpage>542</lpage><year>2018</year><pub-id pub-id-type="pmid">29941873</pub-id><pub-id pub-id-type="doi">10.1038/s41592-018-0033-z</pub-id></element-citation></ref>
<ref id="b39-WASJ-7-2-00315"><label>39</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Li</surname><given-names>WV</given-names></name><name><surname>Li</surname><given-names>JJ</given-names></name></person-group><article-title>An accurate and robust imputation method scImpute for single-cell RNA-seq data</article-title><source>Nat Commun</source><volume>9</volume><issue>997</issue><year>2018</year><pub-id pub-id-type="pmid">29520097</pub-id><pub-id pub-id-type="doi">10.1038/s41467-018-03405-7</pub-id></element-citation></ref>
<ref id="b40-WASJ-7-2-00315"><label>40</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname><given-names>DP</given-names></name><name><surname>Welling</surname><given-names>M</given-names></name></person-group><comment>Auto-encoding variational bayes. arXiv, 2013.</comment></element-citation></ref>
<ref id="b41-WASJ-7-2-00315"><label>41</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gronbech</surname><given-names>CH</given-names></name><name><surname>Vording</surname><given-names>MF</given-names></name><name><surname>Timshel</surname><given-names>PN</given-names></name><name><surname>Sonderby</surname><given-names>CK</given-names></name><name><surname>Pers</surname><given-names>TH</given-names></name><name><surname>Winther</surname><given-names>O</given-names></name></person-group><article-title>scVAE: Variational auto-encoders for single-cell gene expression data</article-title><source>Bioinformatics</source><volume>36</volume><fpage>4415</fpage><lpage>4422</lpage><year>2020</year><pub-id pub-id-type="pmid">32415966</pub-id><pub-id pub-id-type="doi">10.1093/bioinformatics/btaa293</pub-id></element-citation></ref>
<ref id="b42-WASJ-7-2-00315"><label>42</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname><given-names>W</given-names></name><name><surname>Long</surname><given-names>F</given-names></name><name><surname>Pan</surname><given-names>J</given-names></name></person-group><article-title>ScInfoVAE: Interpretable dimensional reduction of single cell transcription data with variational autoencoders and extended mutual information regularization</article-title><source>BioData Min</source><volume>16</volume><issue>17</issue><year>2023</year><pub-id pub-id-type="pmid">37301826</pub-id><pub-id pub-id-type="doi">10.1186/s13040-023-00333-1</pub-id></element-citation></ref>
<ref id="b43-WASJ-7-2-00315"><label>43</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname><given-names>GE</given-names></name><name><surname>Salakhutdinov</surname><given-names>RR</given-names></name></person-group><article-title>Reducing the dimensionality of data with neural networks</article-title><source>Science</source><volume>313</volume><fpage>504</fpage><lpage>507</lpage><year>2006</year><pub-id pub-id-type="pmid">16873662</pub-id><pub-id pub-id-type="doi">10.1126/science.1127647</pub-id></element-citation></ref>
<ref id="b44-WASJ-7-2-00315"><label>44</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Erfanian</surname><given-names>N</given-names></name><name><surname>Heydari</surname><given-names>AA</given-names></name><name><surname>Feriz</surname><given-names>AM</given-names></name><name><surname>Ianez</surname><given-names>P</given-names></name><name><surname>Derakhshani</surname><given-names>A</given-names></name><name><surname>Ghasemigol</surname><given-names>M</given-names></name><name><surname>Farahpour</surname><given-names>M</given-names></name><name><surname>Razavi</surname><given-names>SM</given-names></name><name><surname>Nasseri</surname><given-names>S</given-names></name><name><surname>Safarpour</surname><given-names>H</given-names></name><name><surname>Sahebkar</surname><given-names>A</given-names></name></person-group><article-title>Deep learning applications in single-cell genomics and transcriptomics data analysis</article-title><source>Biomed Pharmacother</source><volume>165</volume><issue>115077</issue><year>2023</year><pub-id pub-id-type="pmid">37393865</pub-id><pub-id pub-id-type="doi">10.1016/j.biopha.2023.115077</pub-id></element-citation></ref>
<ref id="b45-WASJ-7-2-00315"><label>45</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bica</surname><given-names>I</given-names></name><name><surname>Andres-Terre</surname><given-names>H</given-names></name><name><surname>Cvejic</surname><given-names>A</given-names></name><name><surname>Lio</surname><given-names>P</given-names></name></person-group><article-title>Unsupervised generative and graph representation learning for modelling cell differentiation</article-title><source>Sci Rep</source><volume>10</volume><issue>9790</issue><year>2020</year><pub-id pub-id-type="pmid">32555334</pub-id><pub-id pub-id-type="doi">10.1038/s41598-020-66166-8</pub-id></element-citation></ref>
<ref id="b46-WASJ-7-2-00315"><label>46</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rahman</surname><given-names>MA</given-names></name><name><surname>Tutul</surname><given-names>AA</given-names></name><name><surname>Sharmin</surname><given-names>M</given-names></name><name><surname>Bayzid</surname><given-names>MS</given-names></name></person-group><article-title>BEENE: Deep learning-based nonlinear embedding improves batch effect estimation</article-title><source>Bioinformatics</source><volume>39</volume><issue>btad479</issue><year>2023</year><pub-id pub-id-type="pmid">37561107</pub-id><pub-id pub-id-type="doi">10.1093/bioinformatics/btad479</pub-id></element-citation></ref>
<ref id="b47-WASJ-7-2-00315"><label>47</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>RTQ</given-names></name><name><surname>Li</surname><given-names>X</given-names></name><name><surname>Grosse</surname><given-names>R</given-names></name><name><surname>Duvenaud</surname><given-names>D</given-names></name></person-group><comment>Isolating sources of disentanglement in VAEs. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems Curran Associates Inc., Montréal Canada, pp2615-2625, 2018.</comment></element-citation></ref>
<ref id="b48-WASJ-7-2-00315"><label>48</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eraslan</surname><given-names>G</given-names></name><name><surname>Drokhlyansky</surname><given-names>E</given-names></name><name><surname>Anand</surname><given-names>S</given-names></name><name><surname>Fiskin</surname><given-names>E</given-names></name><name><surname>Subramanian</surname><given-names>A</given-names></name><name><surname>Slyper</surname><given-names>M</given-names></name><name><surname>Wang</surname><given-names>J</given-names></name><name><surname>Van Wittenberghe</surname><given-names>N</given-names></name><name><surname>Rouhana</surname><given-names>JM</given-names></name><name><surname>Waldman</surname><given-names>J</given-names></name><etal/></person-group><article-title>Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function</article-title><source>Science</source><volume>376</volume><issue>eabl4290</issue><year>2022</year><pub-id pub-id-type="pmid">35549429</pub-id><pub-id pub-id-type="doi">10.1126/science.abl4290</pub-id></element-citation></ref>
<ref id="b49-WASJ-7-2-00315"><label>49</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Koutrouli</surname><given-names>M</given-names></name><name><surname>Nastou</surname><given-names>K</given-names></name><name><surname>Piera Lindez</surname><given-names>P</given-names></name><name><surname>Bouwmeester</surname><given-names>R</given-names></name><name><surname>Rasmussen</surname><given-names>S</given-names></name><name><surname>Martens</surname><given-names>L</given-names></name><name><surname>Jensen</surname><given-names>LJ</given-names></name></person-group><article-title>FAVA: High-quality functional association networks inferred from scRNA-seq and proteomics data</article-title><source>Bioinformatics</source><volume>40</volume><issue>btae010</issue><year>2024</year><pub-id pub-id-type="pmid">38192003</pub-id><pub-id pub-id-type="doi">10.1093/bioinformatics/btae010</pub-id></element-citation></ref>
<ref id="b50-WASJ-7-2-00315"><label>50</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Szklarczyk</surname><given-names>D</given-names></name><name><surname>Kirsch</surname><given-names>R</given-names></name><name><surname>Koutrouli</surname><given-names>M</given-names></name><name><surname>Nastou</surname><given-names>K</given-names></name><name><surname>Mehryary</surname><given-names>F</given-names></name><name><surname>Hachilif</surname><given-names>R</given-names></name><name><surname>Gable</surname><given-names>AL</given-names></name><name><surname>Fang</surname><given-names>T</given-names></name><name><surname>Doncheva</surname><given-names>NT</given-names></name><name><surname>Pyysalo</surname><given-names>S</given-names></name><etal/></person-group><article-title>The STRING database in 2023: Protein-protein association networks and functional enrichment analyses for any sequenced genome of interest</article-title><source>Nucleic Acids Res</source><volume>51</volume><fpage>D638</fpage><lpage>D646</lpage><year>2023</year><pub-id pub-id-type="pmid">36370105</pub-id><pub-id pub-id-type="doi">10.1093/nar/gkac1000</pub-id></element-citation></ref>
<ref id="b51-WASJ-7-2-00315"><label>51</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goodfellow</surname><given-names>IJ</given-names></name><name><surname>Pouget-Abadie</surname><given-names>J</given-names></name><name><surname>Mirza</surname><given-names>M</given-names></name><name><surname>Xu</surname><given-names>B</given-names></name><name><surname>Warde-Farley</surname><given-names>D</given-names></name><name><surname>Ozair</surname><given-names>S</given-names></name><name><surname>Courville</surname><given-names>A</given-names></name><name><surname>Bengio</surname><given-names>Y</given-names></name></person-group><comment>Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2 MIT Press, Montreal, Canada, pp2672-2680, 2014.</comment></element-citation></ref>
<ref id="b52-WASJ-7-2-00315"><label>52</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lan</surname><given-names>L</given-names></name><name><surname>You</surname><given-names>L</given-names></name><name><surname>Zhang</surname><given-names>Z</given-names></name><name><surname>Fan</surname><given-names>Z</given-names></name><name><surname>Zhao</surname><given-names>W</given-names></name><name><surname>Zeng</surname><given-names>N</given-names></name><name><surname>Chen</surname><given-names>Y</given-names></name><name><surname>Zhou</surname><given-names>X</given-names></name></person-group><article-title>Generative Adversarial Networks and Its Applications in Biomedical Informatics</article-title><source>Front Public Health</source><volume>8</volume><issue>164</issue><year>2020</year><pub-id pub-id-type="pmid">32478029</pub-id><pub-id pub-id-type="doi">10.3389/fpubh.2020.00164</pub-id></element-citation></ref>
<ref id="b53-WASJ-7-2-00315"><label>53</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lacan</surname><given-names>A</given-names></name><name><surname>Sebag</surname><given-names>M</given-names></name><name><surname>Hanczar</surname><given-names>B</given-names></name></person-group><article-title>GAN-based data augmentation for transcriptomics: Survey and comparative assessment</article-title><source>Bioinformatics</source><volume>39</volume><fpage>i111</fpage><lpage>i120</lpage><year>2023</year><pub-id pub-id-type="pmid">37387181</pub-id><pub-id pub-id-type="doi">10.1093/bioinformatics/btad239</pub-id></element-citation></ref>
<ref id="b54-WASJ-7-2-00315"><label>54</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vinas</surname><given-names>R</given-names></name><name><surname>Andres-Terre</surname><given-names>H</given-names></name><name><surname>Lio</surname><given-names>P</given-names></name><name><surname>Bryson</surname><given-names>K</given-names></name></person-group><article-title>Adversarial generation of gene expression data</article-title><source>Bioinformatics</source><volume>38</volume><fpage>730</fpage><lpage>737</lpage><year>2022</year><pub-id pub-id-type="pmid">33471074</pub-id><pub-id pub-id-type="doi">10.1093/bioinformatics/btab035</pub-id></element-citation></ref>
<ref id="b55-WASJ-7-2-00315"><label>55</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Marouf</surname><given-names>M</given-names></name><name><surname>Machart</surname><given-names>P</given-names></name><name><surname>Bansal</surname><given-names>V</given-names></name><name><surname>Kilian</surname><given-names>C</given-names></name><name><surname>Magruder</surname><given-names>DS</given-names></name><name><surname>Krebs</surname><given-names>CF</given-names></name><name><surname>Bonn</surname><given-names>S</given-names></name></person-group><article-title>Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks</article-title><source>Nat Commun</source><volume>11</volume><issue>166</issue><year>2020</year><pub-id pub-id-type="pmid">31919373</pub-id><pub-id pub-id-type="doi">10.1038/s41467-019-14018-z</pub-id></element-citation></ref>
<ref id="b56-WASJ-7-2-00315"><label>56</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lall</surname><given-names>S</given-names></name><name><surname>Ray</surname><given-names>S</given-names></name><name><surname>Bandyopadhyay</surname><given-names>S</given-names></name></person-group><article-title>LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data</article-title><source>Commun Biol</source><volume>5</volume><issue>577</issue><year>2022</year><pub-id pub-id-type="pmid">35688990</pub-id><pub-id pub-id-type="doi">10.1038/s42003-022-03473-y</pub-id></element-citation></ref>
<ref id="b57-WASJ-7-2-00315"><label>57</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname><given-names>X</given-names></name><name><surname>Meng</surname><given-names>S</given-names></name><name><surname>Li</surname><given-names>G</given-names></name><name><surname>Wang</surname><given-names>J</given-names></name><name><surname>Peng</surname><given-names>X</given-names></name></person-group><article-title>AGImpute: Imputation of scRNA-seq data based on a hybrid GAN with dropouts identification</article-title><source>Bioinformatics</source><volume>40</volume><issue>btae068</issue><year>2024</year><pub-id pub-id-type="pmid">38317025</pub-id><pub-id pub-id-type="doi">10.1093/bioinformatics/btae068</pub-id></element-citation></ref>
<ref id="b58-WASJ-7-2-00315"><label>58</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chari</surname><given-names>T</given-names></name><name><surname>Pachter</surname><given-names>L</given-names></name></person-group><article-title>The specious art of single-cell genomics</article-title><source>PLoS Comput Biol</source><volume>19</volume><issue>e1011288</issue><year>2023</year><pub-id pub-id-type="pmid">37590228</pub-id><pub-id pub-id-type="doi">10.1371/journal.pcbi.1011288</pub-id></element-citation></ref>
<ref id="b59-WASJ-7-2-00315"><label>59</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chollet</surname><given-names>F</given-names></name></person-group><comment>Keras. <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/fchollet/keras">https://github.com/fchollet/keras</ext-link>; <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://keras.io">https://keras.io</ext-link>.</comment></element-citation></ref>
<ref id="b60-WASJ-7-2-00315"><label>60</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Abadi</surname><given-names>M</given-names></name><name><surname>Agarwal</surname><given-names>A</given-names></name><name><surname>Barham</surname><given-names>P</given-names></name><name><surname>Brevdo</surname><given-names>E</given-names></name><name><surname>Chen</surname><given-names>Z</given-names></name><name><surname>Citro</surname><given-names>C</given-names></name><name><surname>Corrado</surname><given-names>GS</given-names></name><name><surname>Davis</surname><given-names>A</given-names></name><name><surname>Dean</surname><given-names>J</given-names></name><name><surname>Devin</surname><given-names>M</given-names></name><etal/></person-group><comment>TensorFlow: Large-scale machine learning on heterogeneous distributed Systems. Distributed Parallel Cluster Computing: 16 Mar, 2016.</comment></element-citation></ref>
<ref id="b61-WASJ-7-2-00315"><label>61</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mittal</surname><given-names>S</given-names></name><name><surname>Vaishay</surname><given-names>S</given-names></name></person-group><article-title>A survey of techniques for optimizing deep learning on GPUs</article-title><source>J Systems Architecture</source><volume>99</volume><issue>101635</issue><year>2019</year></element-citation></ref>
<ref id="b62-WASJ-7-2-00315"><label>62</label><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>J</given-names></name><name><surname>Park</surname><given-names>H</given-names></name></person-group><article-title>Limited discriminator GAN using explainable AI model for overfitting problem</article-title><source>ICT Express</source><volume>9</volume><fpage>241</fpage><lpage>246</lpage><year>2023</year></element-citation></ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-WASJ-7-2-00315" position="float">
<label>Figure 1</label>
<caption><p>An example of feature extraction dimensionality reduction of scRNA-Seq gene count data. (A) The original input scRNA-Seq count data. In this case, the full non-normalised scRNA-Seq dataset procured from the GTEx database is depicted, containing 17,625 genes and 209,216 cells (<xref rid="b48-WASJ-7-2-00315" ref-type="bibr">48</xref>). The dataset is characterised not only by high dimensions but also by the appearance of numerous zero gene counts. (B) The same GTEx scRNA-Seq data after dimensionality reduction in the level of cells. Cells are replaced by a far smaller number of 300 latent samples which retain the variance of the original sample set, with the number of genes staying the same. The gene counts have been replaced by gene expressions which contain the biological relevance of the original data, while also filling in the zero values. scRNA-Seq, single-cell RNA sequencing; GTEx database, Genotype-Tissue Expression database.</p></caption>
<graphic xlink:href="wasj-07-02-00315-g00.tif"/>
</fig>
<fig id="f2-WASJ-7-2-00315" position="float">
<label>Figure 2</label>
<caption><p>Flowchart of a simplified pre-processing scRNA-Seq workflow and consequent dimensionality reduction analyses. Starting from raw sequencing data, pre-processing steps (quality control, alignment, and gene counting) generate a high-dimensional gene expression matrix. Dimensionality reduction methods, such as PCA, UMAP, t-SNE, and advanced deep learning approaches (e.g., VAEs, GANs), address data sparsity and complexity, facilitating visualisation and downstream analyses. These techniques enable the extraction and preservation of critical biological information, forming the basis for deeper biological inferences. scRNA-Seq, single-cell RNA sequencing; PCA, principal component analysis; UMAP, uniform manifold approximation and projection; t-SNE, t-distributed stochastic neighbour embedding; VAEs, variational auto encoders; GANs, generative adversarial networks.</p></caption>
<graphic xlink:href="wasj-07-02-00315-g01.tif"/>
</fig>
<table-wrap id="tI-WASJ-7-2-00315" position="float">
<label>Table I</label>
<caption><p>Comparison of dimensionality reduction techniques for scRNA-Seq data.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="middle">Technique</th>
<th align="center" valign="middle">Description</th>
<th align="center" valign="middle">Rationale</th>
<th align="center" valign="middle">Advantages</th>
<th align="center" valign="middle">Disadvantages</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle">PCA</td>
<td align="left" valign="middle">Linear transformation creating new variables (principal components) to retain most variance in the data</td>
<td align="left" valign="middle">Reduces the dimensions of scRNA-Seq data while retaining meaningful variance</td>
<td align="left" valign="middle">• Retains most variability • Simple and widely used • Fast execution</td>
<td align="left" valign="middle">• Limited to linear associations • Sensitive to noise in data</td>
</tr>
<tr>
<td align="left" valign="middle">t-SNE</td>
<td align="left" valign="middle">Non-linear method using Student-t distribution to visualise data in 2D/3D by capturing relationships among data points</td>
<td align="left" valign="middle">Maps scRNA-Seq data into a comprehensible visual format</td>
<td align="left" valign="middle">• Captures non-linear relationships • Effective for visualising clusters</td>
<td align="left" valign="middle">• May be computationally expensive with large datasets • Can fail to preserve global structure without data initialisation</td>
</tr>
<tr>
<td align="left" valign="middle">UMAP</td>
<td align="left" valign="middle">Non-linear methods that constructs a graph of data points and optimises a low-dimensional representation</td>
<td align="left" valign="middle">Alternative to t-SNE, focusing on speed and better global structure representation</td>
<td align="left" valign="middle">• Faster than t-SNE • Better global structure retention • Flexible parameter tuning</td>
<td align="left" valign="middle">• Requires careful tuning • Interpretation may vary with parameters</td>
</tr>
<tr>
<td align="left" valign="middle">scvis</td>
<td align="left" valign="middle">Deep learning model using autoencoders for data visualisation</td>
<td align="left" valign="middle">Deep-learning alternative to t-SNE and UMAP</td>
<td align="left" valign="middle">• Handles both linear and non-linear relationships • Scales well to large datasets</td>
<td align="left" valign="middle">• Requires substantial computational resources • Performance depends on architecture and training</td>
</tr>
<tr>
<td align="left" valign="middle">DCA</td>
<td align="left" valign="middle">Application of autoencoders which focuses on denoising scRNA-Seq data</td>
<td align="left" valign="middle">Reduces noise and imputes scRNA-Seq data</td>
<td align="left" valign="middle">• Improves data quality by denoising • Better imputation performance than traditional methods</td>
<td align="left" valign="middle">• Relies heavily on initial parameter selection • Computationally intensive for very large datasets</td>
</tr>
<tr>
<td align="left" valign="middle">VAEs</td>
<td align="left" valign="middle">Probabilistic version of autoencoders that maps data to distributions in a latent space and reconstructs data by sampling from these distributions</td>
<td align="left" valign="middle">Generates a low-dimensional dataset that retains the biological information of input scRNA-Seq</td>
<td align="left" valign="middle">• Captures both linear and non-linear patterns • Effective for downstream analyses</td>
<td align="left" valign="middle">• Requires expertise in probabilistic modelling • Models can be complex to train effectively</td>
</tr>
<tr>
<td align="left" valign="middle">GANs</td>
<td align="left" valign="middle">Two neural networks (generator and discriminator) adversarially trained to create realistic synthetic data</td>
<td align="left" valign="middle">Generates biologically plausible data by learning input scRNA-Seq data distributions</td>
<td align="left" valign="middle">• Generates realistic synthetic datasets • Useful for data augmentation • Handles complex distributions effectively</td>
<td align="left" valign="middle">• Training is challenging and requires significant computational resources • High risk of generating artefacts or overfitting discriminator</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>scRNA-Seq, single-cell RNA sequencing; PCA, principal component analysis; DCA, deep count autoencoder; UMAP, uniform manifold approximation and projection; t-SNE, t-distributed stochastic neighbour embedding; VAEs, variational auto encoders; GANs, generative adversarial networks.</p></fn>
</table-wrap-foot>
</table-wrap>
</floats-group>
</article>
