Speeding up UMAP plots for single cell gene expression analysis
R
data visualization
bioinformatics
Published
October 21, 2025
Analyzing single cell data often requires visualizing thousands to millions of data point on a graph. Current R packages such as Seurat::DimPlot are limited by long plotting times, impeding efficient exploratory analysis.
For example, this is how long it takes to visualize 10 genes on a 14,000 single cell RNAseq (scRNAseq) dataset.
xy <-FetchData(seu, vars =c("umap_1", "umap_2", "seurat_annotations", rownames(seu)))
Code
bnch |>select(expression, min, median, n_itr)
# A tibble: 1 × 3
expression min median
<bch:expr> <bch:tm> <bch:tm>
1 Seurat, not sampled 5.31s 6.32s
It takes 6.3216542 seconds to plot 10 features with a 14,000 single cell dataset (number of cells = 14,000). This dataset is on the smaller side - considering that single cell datasets often reach the hundreds of thousands, the speed of plotting is a significant hamper on single cell analysis.
Sampling to speed up plotting
Plotting tens to hundreds of thousands of cells is likely not completely necessary. We can explore whether plotting a sample of the dataset will be sufficient to maintain a faithful representation of the entire dataset, while improving speed.
# A tibble: 2 × 3
expression min median
<bch:expr> <bch:tm> <bch:tm>
1 Seurat, not sampled 5.31s 6.32s
2 Seurat, sampled 10% (n=1,400) 3.61s 3.8s
The improvement in speed is also slightly better, but still very slow. Seurat::FeaturePlot may have some processes that are slow. Let’s try a naive solution:
# A tibble: 1 × 3
expression min median
<bch:expr> <bch:tm> <bch:tm>
1 naive 1.24s 1.25s
The naive plot takes 1.25 seconds or 5.05x faster.
But there are drawbacks with this naive solution. Notably it’s missing some of features that Seurat smartly incorporates:
point sizing based on number of points. When plotting larger datasets, the optimal point size is smaller to avoid overplotting
Hard to see sparsely and low-expressed genes e.g. FOXP3 and CD4. This is introduced by the combined color scale, mapping low/highe xpression to one common color scale across all genes. Seurat maintains an independent color scale for each gene
Let’s see if we can address these shortcomings without trading off speed.
Point sizing
Seurat::FeaturePlot uses a simple formula to calculate point size in relation to number of cells. But it doesn’t take into account when visualizing multiple features.
Here we adjust this over total number of cells * total number of features.
We implemented some conditional sampling based on the total number of expressed genes, and the desired sample size. All cells are retained for genes that have less than half of the desired sample (1400/2 = 700). This results in a visualization where highly expressed genes are sampled proportionally, and sparsely expressed genes are retained. This can be helpful in identifying cells that express these lowly expressed genes.
Here are the exact number of cells sampled for each gene:
Finally, let’s compare our solution with Seurat::FeaturePlot over 10 genes for 14000 cells:
Code
bnch_final <- bench::mark(`Seurat, not sampled`= Seurat::FeaturePlot(seu, features = features) |>plot(),`Custom solution`=naive_plot3(seu, features) |>plot(),memory =FALSE,check =FALSE,iterations =4) |>select(expression, min, median, n_itr)
We improved the speed by 5.05 times for a 14,000 cell dataset. I expect relative performance to be even greater for larger single cell datasets, since our sampling approach plots the same number of points regardless of dataset size.
We accomplish while improving the ability to detect lowly expressed / sparse genes. See the results yourself:
The drawback of this increased sensitivity is there is more noise. Especially for the higher expressed genes, there’s a lot of lowly expressed cells that that appear highlighted, which might be distracting.