
scran, in Python
Overview
The scranpy package provides Python bindings to the single-cell analysis methods in the libscran C++ libraries.
It performs the standard steps in a typical single-cell analysis including quality control, normalization, feature selection, dimensionality reduction, clustering and marker detection.
This package is effectively a mirror of its counterparts in Javascript (scran.js) and R (scrapper),
which are based on the same underlying C++ libraries and concepts.
Quick start
Let's fetch a dataset from the scrnaseq package:
import scrnaseq
sce = scrnaseq.fetch_dataset("zeisel-brain-2015", "2023-12-14", realize_assays=True)
print(sce)
Then we call scranpy's analyze()
functions, with some additional information about the mitochondrial subset for quality control purposes.
import scranpy
results = scranpy.analyze(
sce,
rna_subsets = {
"mito": [name.startswith("mt-") for name in sce.get_row_names()]
}
)
This will perform all of the usual steps for a routine single-cell analysis,
as described in Bioconductor's Orchestrating single cell analysis book.
It returns an object containing clusters, t-SNEs, UMAPs, marker genes, and so on:
print(results.clusters)
print(results.tsne)
print(results.umap)
first_markers = results.rna_markers.to_biocframes(summaries=["median"])[0]
first_markers.set_row_names(results.rna_row_names, in_place=True)
print(first_markers)
Users can also convert the results into a SingleCellExperiment
for easier manipulation:
print(results.to_singlecellexperiment())
Check out the reference documentation for more details.
Multiple batches
To demonstrate, let's grab two pancreas datasets from the scrnaseq package.
Each dataset represents a separate batch of cells generated in different studies.
import scrnaseq
gsce = scrnaseq.fetch_dataset("grun-pancreas-2016", "2023-12-14", realize_assays=True)
msce = scrnaseq.fetch_dataset("muraro-pancreas-2016", "2023-12-19", realize_assays=True)
They don't have the same features, so we'll just take the intersection of their row names before combining them into a single SingleCellExperiment
object:
import biocutils
common = biocutils.intersect(gsce.get_row_names(), msce.get_row_names())
combined = biocutils.relaxed_combine_columns(
gsce[biocutils.match(common, gsce.get_row_names()), :],
msce[biocutils.match(common, msce.get_row_names()), :]
)
print(combined)
We can now perform a batch-aware analysis, where the blocking factor is also used in relevant functions to avoid problems with batch effects.
import scranpy
block = ["grun"] * gsce.shape[1] + ["muraro"] * msce.shape[1]
results = scranpy.analyze(combined, block=block)
This yields mostly the same set of results as before, but with an extra MNN-corrected embedding for clustering, visualization, etc.
results.mnn_corrected.corrected
Multiple modalities
Let's grab a 10X Genomics immune profiling dataset (see here),
which contains count data for the entire transcriptome and targeted proteins:
import singlecellexperiment
sce = singlecellexperiment.read_tenx_h5("immune_3.0.0-tenx.h5", realize_assays=True)
sce.set_row_names(sce.get_row_data().get_column("id"), in_place=True)
We split it to genes and ADTs:
feattypes = sce.get_row_data().get_column("feature_type")
gene_data = sce[[x == "Gene Expression" for x in feattypes],:]
adt_data = sce[[x == "Antibody Capture" for x in feattypes],:]
And now we can run the analysis:
import scranpy
results = scranpy.analyze(
gene_data,
adt_x = adt_data,
rna_subsets = {
"mito": [n.startswith("MT-") for n in gene_data.get_row_data().get_column("name")]
},
adt_subsets = {
"igg": [n.startswith("IgG") for n in adt_data.get_row_data().get_column("name")]
}
)
This returns ADT-specific results in the relevant fields, as well as a set of combined PCs for use in clustering, visualization, etc.
print(results.adt_size_factors)
print(results.combined_pca.combined)
second_markers = results.adt_markers.to_biocframes(summaries=["min_rank"])[1]
second_markers.set_row_names(results.adt_row_names, in_place=True)
print(second_markers)
Customizing the analysis
Most parameters can be changed by modifying the relevant arguments in analyze()
.
For example:
import scrnaseq
sce = scrnaseq.fetch_dataset("zeisel-brain-2015", "2023-12-14", realize_assays=True)
is_mito = [name.startswith("mt-") for name in sce.get_row_names()]
import scranpy
results = scranpy.analyze(
sce,
rna_subsets = {
"mito": is_mito
},
build_snn_graph_options = {
"num_neighbors": 10
},
cluster_graph_options = {
"multilevel_resolution": 2
},
run_pca_options = {
"number": 15
},
run_tsne_options = {
"perplexity": 25
},
run_umap_options = {
"min_dist": 0.05
}
)
For finer control, users can call each step individually via lower-level functions.
A typical RNA analysis might be implemented as:
counts = sce.assay(0)
qcmetrics = scranpy.compute_rna_qc_metrics(counts, subsets=is_mito)
thresholds = scranpy.suggest_rna_qc_thresholds(qcmetrics)
filter = scranpy.filter_rna_qc_metrics(thresholds, metrics)
import delayedarray
filtered = delayedarray.DelayedArray(rna_x)[:,filter]
sf = scranpy.center_size_factors(qcmetrics.sum[filter])
normalized = scranpy.normalize_counts(filtered, sf)
vardf = scranpy.model_gene_variances(normalized)
hvgs = scranpy.choose_highly_variable_genes(vardf.residual)
pca = scranpy.run_pca(normalized[hvgs,:])
nn_out = scranpy.run_all_neighbor_steps(pca.components)
clusters = nn_out.cluster_graph.membership
markers = scranpy.score_markers(normalized, groups=clusters)
Check out analyze.py
for more details.