Multicore functions / parallel implementations plus speed optimized & utility functions for Seurat 2 & 3
Multicore functions & implementations for Seurat using doMC
/ foreach
packages.
Implementations are either from me or found on the web.
This repository now serves 4 main purposes:
- Multicore read/write/save/load/compress functions (
Seurat3.Multicore.Read.Write.R
) - Multicore implementation of single core functions via
foreach / dopar
(Seurat.Multicore.Examples.R
) - Legacy functionality for Seurat v2.x (
Seurat2.Multicore.Functions.R
) - Other functionalities are in (
Seurat3.Multicore.Generic.Functions.R
,Seurat.Functions.other.R
)
Notice: most of the non-multicore functionalites were migrated to Seurat.utils`
Use case
Some Seurat functions can be fairly slow when run on a single core. To speed up you can use all cores of your computer.
- Seurat 2.x has very limited multicore functionality (ScaleData, Jackstraw).
- Seurat 3.0 has implemented multiple functions using future.
- Functions here use a
foreach
based parallel implementations/templates are mostly complementary to the implemented to Seurat’s implementation
Tested on OS X, but it is in development.
Notice
‘Future’ and ‘doMC (foreach)’ based parallelisation seem to collide in one case. If you load futures before, using NormalizeData
inside a foreach loop, it fails (Error writing to connection).
Solution: do not load & setup future
before NormalizeData
.
# After NormalizeData
library(future)
plan("multiprocess", workers = 6)
# So to set Max mem size to 2GB, you would run :
options(future.globals.maxSize = 4000 * 1024^2)
Install
- Download (clone) this repo locally,
- change the file path in each
.R
script (to where you keep them on your computer), and source("~/path/to/Seurat3.Multicore.Load.R")
(Make sure you also change the path’s inside this file):
Alternative: Directly source each .R
script from the web, e.g:
source("https://raw.githubusercontent.com/vertesy/Seurat.multicore/master/Seurat3.Multicore.Generic.Functions.R")
Content
!! Notice: most of the non-multicore functionalites were migrated to https://github.com/vertesy/Seurat.utils
!! Notice: Consequently, content now changed
- Seurat3.Multicore.Read.Write.R:
- Multi-core / parallelized read/write/save/load/compress functions
- Seurat.Multicore.Examples.R
- Single-core functions wrapped in multi-core / parallelized foreach loops
- Seurat2.Multicore.Functions.R
- Legacy functionality for Seurat v2.x
- Seurat3.Multicore.Generic.Functions.R
- Multicore
- Seurat.Functions.other.R:
- Other functionalities
Implementations
1.Parallel Implementation
- FindAllMarkers.multicore
- rrRDS: read a list of objects
- rreadRDS: optionally parallel decompression of saved object by pigz_pipe
- sssRDS: save a list of objects
- ssaveRDS: parallel compression of saved objects by pigz_pipe
- snappy_pipe: fast single-core, loose compression
- pigz_pipe
- ssaveRDS: parallel compression of saved objects by pigz_pipe
2.Parallel templates with foreach
- CreateSeuratObject
- saveRDS
- readRDS
- METADATA
- FilterCells; subset (in v3)
- NormalizeData
3. Parallel Implementation by Seurat (3.1)
- NormalizeData
- Jackstraw (from v2)
- ScaleData (from v2)
- FindMarkers
- FindIntegrationAnchors
- FindClusters
4. No Parallel Implementation
- RunTSNE
- RunUMAP
5. Other functions implemented / collected here
Functions in main script
!! Notice: most of the non-multicore functionalites were migrated to https://github.com/vertesy/Seurat.utils
parallel.computing.by.future()
# Run gc(), load multi-session computing and extend memory limits.seu.Make.Cl.Label.per.cell()
# Take a named vector (of e.g. values =”gene names”, names = clusterID), and a vector of cell-IDs and make a vector of “GeneName.ClusterID”.add.Cl.Label.2.Metadata()
# Add a metadata columnumapNamedClusters()
# Plot and save umap based on metadata column.clip10Xcellname()
# Clip all suffices after underscore (10X adds it per chip-lane, Seurat adds in during integration).make10Xcellname()
# Add a suffixread10x()
# read10x from gzipped and using features.tsvFindAllMarkers.multicore()
# Multicore version of FindAllMarkers.gene.name.check()
# Check gene names in a seurat object, for naming conventions (e.g.: mitochondrial reads have - or .). Use for reading .mtx & writing .rds files.check.genes()
# Check if genes exist in your dataset.fixZeroIndexing.seurat()
# Fix zero indexing in seurat clustering, to 1-based indexinggetMetadataColumn()
<- mmeta # Get a metadata column as a named vectorgetCellIDs.from.meta()
# Get cellIDs from a metadata column, matching a list of values (using %in%).seu.add.meta.from.table()
# Add to obj@metadata from an external tableseu.PC.var.explained()
# Determine percent of variation associated with each PCseu.plot.PC.var.explained()
# Plot the percent of variation associated with each PC
Functions in Saving.and.loading.R
isave.RDS()
# faster saving of workspace, and compression outside R, when it can run in the background. Seemingly quite CPU hungry and not veryefficient compression.isave.RDS.pigz()
# faster saving of workspace, and compression outside R, when it can run in the background. Seemingly quite CPU hungry and not veryefficient compression.isave.image()
# faster saving of workspace, and compression outside R, when it can run in the background. Seemingly quite CPU hungry and not veryefficient compression.subsetSeuObj.and.Save()
# subset a compressed Seurat Obj and save it in wd.seuSaveRds()
# Save a compressed Seurat Object, with parallel gzip by pgzipsampleNpc()
# Sample N % of a dataframe (obj@metadata), and return the cell IDs.rrRDS()
# Load a list of RDS files with parallel ungzip by pgzip.sssRDS()
# Save multiple objects into a list of RDS files using parallel gzip by pgzip (optional).ssaveRDS()
# Save an object with parallel gzip by pgzip.rreadRDS()
# Read an object with parallel ungzip by pgzip.snappy_pipe()
# Alternative, fast compression. Low compression rate, lightning fast.pigz_pipe()
# Alternative: normal gzip output (& compression rate), ~*cores faster in zipping.
Functions in Seurat3.plotting.Functions.R
umapHiLightSel()
# Highlight a set of cells based on clusterIDs provided.qUMAP()
# Quick umapsmultiFeaturePlot.A4()
# Save multiple FeaturePlot from a list of genes on A4 jpegmultiFeatureHeatmap.A4()
# Save multiple FeatureHeatmaps from a list of genes on A4 jpegplot.UMAP.tSNE.sidebyside()
# plot a UMAP and tSNE sidebysidesgCellFractionsBarplot.Mseq()
# Cell Fractions Barplot for MULTI-seq. sg stands for “seurat ggplot”.ssgCellFractionsBarplot.CORE()
# Cell Fractions Barplots, basic. sg stands for “seurat ggplot”.sgCellFractionsBarplot()
# Cell Fractions Barplots. sg stands for “seurat ggplot”.ww.variable.exists.and.true()
# Check if a variable exists and its value is TRUE.save2umaps.A4()
# Save 2 umaps on A4.save4umaps.A4()
# Save 4 umaps on A4.
Other functions
- multiFeaturePlot.A4
- multi-core implementation (of generating plots) did not work: it kept hanging at n*100% cpu use.
- multiFeatureHeatmap.A4
- LabelPoint
- LabelUR
- LabelUL
- LabelBR
- LabelBL
- read10x