Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
The scranpy package provides Python bindings to the single-cell analysis methods in the libscran C++ libraries. It performs the standard steps in a typical single-cell analysis including quality control, normalization, feature selection, dimensionality reduction, clustering and marker detection. This package is effectively a mirror of its counterparts in Javascript (scran.js) and R (scrapper), which are based on the same underlying C++ libraries and concepts.
Let's fetch a dataset from the scrnaseq package:
import scrnaseq
sce = scrnaseq.fetch_dataset("zeisel-brain-2015", "2023-12-14", realize_assays=True)
print(sce)
## class: SingleCellExperiment
## dimensions: (20006, 3005)
## assays(1): ['counts']
## row_data columns(1): ['featureType']
## row_names(20006): ['Tspan12', 'Tshz1', 'Fnbp1l', ..., 'mt-Rnr2', 'mt-Rnr1', 'mt-Nd4l']
## column_data columns(9): ['tissue', 'group #', 'total mRNA mol', 'well', 'sex', 'age', 'diameter', 'level1class', 'level2class']
## column_names(3005): ['1772071015_C02', '1772071017_G12', '1772071017_A05', ..., '1772063068_D01', '1772066098_A12', '1772058148_F03']
## main_experiment_name: gene
## reduced_dims(0): []
## alternative_experiments(2): ['repeat', 'ERCC']
## row_pairs(0): []
## column_pairs(0): []
## metadata(0):
Then we call scranpy's analyze()
functions, with some additional information about the mitochondrial subset for quality control purposes.
import scranpy
results = scranpy.analyze(
sce,
rna_subsets = {
"mito": [name.startswith("mt-") for name in sce.get_row_names()]
}
)
This will perform all of the usual steps for a routine single-cell analysis, as described in Bioconductor's Orchestrating single cell analysis book. It returns an object containing clusters, t-SNEs, UMAPs, marker genes, and so on:
print(results.clusters)
## [ 0 0 0 ... 6 6 13]
print(results.tsne)
## [[ 6.24189264 6.12559936 5.41776875 ... 20.07822751 18.25022123
## 14.78338538]
## [-28.82249121 -28.18510674 -28.92849565 ... 7.73694327 3.70750309
## 7.13103324]]
print(results.umap)
## [[ 9.84396648 9.73148155 9.83376408 ... -6.64551735 -5.74155378
## -4.41887522]
## [-1.26350224 -1.16540933 -1.13979638 ... -5.63315582 -4.83151293
## -6.02484226]]
first_markers = results.rna_markers.to_biocframes(summaries=["median"])[0]
first_markers.set_row_names(results.rna_row_names, in_place=True)
print(first_markers)
## BiocFrame with 20006 rows and 6 columns
## mean detected cohens_d_median auc_median delta_mean_median delta_detected_median
## <ndarray[float64]> <ndarray[float64]> <ndarray[float64]> <ndarray[float64]> <ndarray[float64]> <ndarray[float64]>
## Tspan12 0.35759151503119846 0.3157894736842105 0.31138667468315545 0.5989624247185128 0.31138667468315545 0.31138667468315545
## Tshz1 0.5997779968274406 0.41776315789473684 0.36865087228075244 0.6031352215283973 0.36865087228075244 0.36865087228075244
## Fnbp1l 1.1660581606996154 0.6875 0.7644031115934953 0.6905056759545924 0.7644031115934953 0.7644031115934953
## ... ... ... ... ... ...
## mt-Rnr2 6.966227511583628 0.9967105263157895 -0.7666238430581961 0.2928277982073087 -0.7666238430581961 -0.7666238430581961
## mt-Rnr1 4.914541788016454 0.9769736842105263 -0.4847704371628273 0.3834708208344696 -0.4847704371628273 -0.4847704371628273
## mt-Nd4l 3.2901199968427246 0.9342105263157894 -0.5903983282435646 0.30724666142969365 -0.5903983282435646 -0.5903983282435646
Users can also convert the results into a SingleCellExperiment
for easier manipulation:
print(results.to_singlecellexperiment())
## class: SingleCellExperiment
## dimensions: (20006, 2874)
## assays(2): ['filtered', 'normalized']
## row_data columns(5): ['mean', 'variance', 'fitted', 'residual', 'is_highly_variable']
## row_names(20006): ['Tspan12', 'Tshz1', 'Fnbp1l', ..., 'mt-Rnr2', 'mt-Rnr1', 'mt-Nd4l']
## column_data columns(5): ['sum', 'detected', 'subset_proportion_mito', 'size_factors', 'clusters']
## column_names(2874): ['1772071015_C02', '1772071017_G12', '1772071017_A05', ..., '1772066097_D04', '1772063068_D01', '1772066098_A12']
## main_experiment_name:
## reduced_dims(3): ['pca', 'tsne', 'umap']
## alternative_experiments(0): []
## row_pairs(0): []
## column_pairs(0): []
## metadata(0):
Check out the reference documentation for more details.
To demonstrate, let's grab two pancreas datasets from the scrnaseq package. Each dataset represents a separate batch of cells generated in different studies.
import scrnaseq
gsce = scrnaseq.fetch_dataset("grun-pancreas-2016", "2023-12-14", realize_assays=True)
msce = scrnaseq.fetch_dataset("muraro-pancreas-2016", "2023-12-19", realize_assays=True)
They don't have the same features, so we'll just take the intersection of their row names before combining them into a single SingleCellExperiment
object:
import biocutils
common = biocutils.intersect(gsce.get_row_names(), msce.get_row_names())
combined = biocutils.relaxed_combine_columns(
gsce[biocutils.match(common, gsce.get_row_names()), :],
msce[biocutils.match(common, msce.get_row_names()), :]
)
print(combined)
## class: SingleCellExperiment
## dimensions: (18499, 4800)
## assays(1): ['counts']
## row_data columns(2): ['symbol', 'chr']
## row_names(18499): ['A1BG-AS1__chr19', 'A1BG__chr19', 'A1CF__chr10', ..., 'ZYX__chr7', 'ZZEF1__chr17', 'ZZZ3__chr1']
## column_data columns(4): ['donor', 'sample', 'label', 'plate']
## column_names(4800): ['D2ex_1', 'D2ex_2', 'D2ex_3', ..., 'D30-8_94', 'D30-8_95', 'D30-8_96']
## main_experiment_name: endogenous
## reduced_dims(0): []
## alternative_experiments(0): []
## row_pairs(0): []
## column_pairs(0): []
## metadata(0):
We can now perform a batch-aware analysis, where the blocking factor is also used in relevant functions to avoid problems with batch effects.
import scranpy
block = ["grun"] * gsce.shape[1] + ["muraro"] * msce.shape[1]
results = scranpy.analyze(combined, block=block) # no mitochondrial genes in this case...
This yields mostly the same set of results as before, but with an extra MNN-corrected embedding for clustering, visualization, etc.
results.mnn_corrected.corrected
## array([[-1.87690275e+01, -2.20133721e+01, -2.01364711e+01, ...,
## 1.60988874e+01, -2.10494187e+01, -9.41325421e+00],
## [ 9.95069366e+00, 1.12168142e+01, 1.40745981e+01, ...,
## -5.63689417e+00, -1.46003926e+01, -4.02325382e+00],
## [ 1.17917046e+01, 8.40756681e+00, 1.24557851e+01, ...,
## 3.65281722e+00, -1.13280613e+01, -1.12939448e+01],
## ...,
## [-4.20177077e+00, 3.64443391e-01, 1.13834851e+00, ...,
## 1.43898885e-02, -2.24228270e+00, -5.89749453e-01],
## [-2.49456306e+00, 6.82624452e-01, 2.30363317e+00, ...,
## 1.09145269e+00, 3.17776365e+00, 8.27058276e-01],
## [-2.03562222e+00, 2.04701389e+00, 5.64774034e-01, ...,
## 4.31078606e-01, -4.02375136e-01, 8.52493315e-01]],
## shape=(25, 3984))
Let's grab a 10X Genomics immune profiling dataset (see here), which contains count data for the entire transcriptome and targeted proteins:
import singlecellexperiment
sce = singlecellexperiment.read_tenx_h5("immune_3.0.0-tenx.h5", realize_assays=True)
sce.set_row_names(sce.get_row_data().get_column("id"), in_place=True)
## class: SingleCellExperiment
## dimensions: (33555, 8258)
## assays(1): ['counts']
## row_data columns(7): ['feature_type', 'genome', 'id', 'name', 'pattern', 'read', 'sequence']
## row_names(33555): ['ENSG00000243485', 'ENSG00000237613', 'ENSG00000186092', ..., 'IgG2b', 'CD127', 'CD15']
## column_data columns(1): ['barcodes']
## column_names(0):
## main_experiment_name:
## reduced_dims(0): []
## alternative_experiments(0): []
## row_pairs(0): []
## column_pairs(0): []
## metadata(0):
We split it to genes and ADTs:
feattypes = sce.get_row_data().get_column("feature_type")
gene_data = sce[[x == "Gene Expression" for x in feattypes],:]
adt_data = sce[[x == "Antibody Capture" for x in feattypes],:]
And now we can run the analysis:
import scranpy
results = scranpy.analyze(
gene_data,
adt_x = adt_data,
rna_subsets = {
"mito": [n.startswith("MT-") for n in gene_data.get_row_data().get_column("name")]
},
adt_subsets = {
"igg": [n.startswith("IgG") for n in adt_data.get_row_data().get_column("name")]
}
)
This returns ADT-specific results in the relevant fields, as well as a set of combined PCs for use in clustering, visualization, etc.
print(results.adt_size_factors)
## [0.79359408 0.79410332 0.89536413 ... 0.79207839 0.66492723 0.76847637]
print(results.combined_pca.combined)
## [[-9.97603155e+00 -1.04045057e+01 -1.26408576e+01 ... -1.29361354e+01
## -1.09887392e+01 -1.08070608e+01]
## [ 7.47726554e+00 6.77629373e+00 1.78091509e+00 ... 2.22256539e+00
## 5.96667219e+00 7.10437993e+00]
## [-2.63898263e+00 -1.24485522e+00 5.51002546e+00 ... 5.21037673e+00
## -5.54233035e+00 -3.38828724e+00]
## ...
## [-2.04699441e-01 -4.38991650e-01 -2.87170731e+00 ... 2.36527395e+00
## 7.05969255e-01 -2.46180209e-01]
## [ 4.75688909e-01 -1.54557081e-01 -1.30053159e+00 ... 2.81492567e+00
## 1.21607502e+00 -3.12194853e-01]
## [ 8.56575012e-02 8.74924626e-03 -7.17362957e-04 ... 1.65769854e-01
## 1.73927253e-01 5.04057044e-02]]
second_markers = results.adt_markers.to_biocframes(summaries=["min_rank"])[1]
second_markers.set_row_names(results.adt_row_names, in_place=True)
print(second_markers)
## BiocFrame with 17 rows and 6 columns
## mean detected cohens_d_min_rank auc_min_rank delta_mean_min_rank delta_detected_min_rank
## <ndarray[float64]> <ndarray[float64]> <ndarray[uint32]> <ndarray[uint32]> <ndarray[uint32]> <ndarray[uint32]>
## CD3 11.04397358391642 1.0 1 1 1 1
## CD19 4.072383130863625 1.0 4 4 4 4
## CD45RA 10.481785289114054 1.0 1 1 1 1
## ... ... ... ... ... ...
## IgG2b 2.8690172565558263 0.9911190053285968 6 5 6 6
## CD127 6.258223461814724 1.0 2 2 2 2
## CD15 5.366264191077669 1.0 4 4 4 4
Most parameters can be changed by modifying the relevant arguments in analyze()
.
For example:
import scrnaseq
sce = scrnaseq.fetch_dataset("zeisel-brain-2015", "2023-12-14", realize_assays=True)
is_mito = [name.startswith("mt-") for name in sce.get_row_names()]
import scranpy
results = scranpy.analyze(
sce,
rna_subsets = {
"mito": is_mito
},
build_snn_graph_options = {
"num_neighbors": 10
},
cluster_graph_options = {
"multilevel_resolution": 2
},
run_pca_options = {
"number": 15
},
run_tsne_options = {
"perplexity": 25
},
run_umap_options = {
"min_dist": 0.05
}
)
For finer control, users can call each step individually via lower-level functions. A typical RNA analysis might be implemented as:
counts = sce.assay(0)
qcmetrics = scranpy.compute_rna_qc_metrics(counts, subsets=is_mito)
thresholds = scranpy.suggest_rna_qc_thresholds(qcmetrics)
filter = scranpy.filter_rna_qc_metrics(thresholds, metrics)
import delayedarray # avoid an actual copy of the matrix.
filtered = delayedarray.DelayedArray(rna_x)[:,filter]
sf = scranpy.center_size_factors(qcmetrics.sum[filter])
normalized = scranpy.normalize_counts(filtered, sf)
vardf = scranpy.model_gene_variances(normalized)
hvgs = scranpy.choose_highly_variable_genes(vardf.residual)
pca = scranpy.run_pca(normalized[hvgs,:])
nn_out = scranpy.run_all_neighbor_steps(pca.components)
clusters = nn_out.cluster_graph.membership
markers = scranpy.score_markers(normalized, groups=clusters)
Check out analyze.py
for more details.
FAQs
Analyze multi-modal single-cell data!
We found that scranpy demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.