Module with utility functions to process CRISPR-based screens and method to correct gene independent copy-number effects.
Description
Crispy uses Sklearn implementation of Gaussian Process Regression, fitting each sample independently.
Install
Install pybedtools
and then install Crispy
conda install -c bioconda pybedtools
pip install cy
Examples
Support to library imports:
from crispy.CRISPRData import Library
master_lib = Library.load_library("MasterLib_v1.csv.gz")
minimal_lib = Library.load_library("MinLibCas9.csv.gz")
brunello_lib = Library.load_library("Brunello_v1.csv.gz")
Select sgRNAs (across multiple CRISPR-Cas9 libraries) for a given gene:
from crispy.GuideSelection import GuideSelection
gselection = GuideSelection()
gene_guides = gselection.select_sgrnas(
"MCL1", n_guides=5, offtarget=[1, 0], jacks_thres=1, ruleset2_thres=.4
)
gene_guides = gselection.selection_rounds("TRIM49", n_guides=5, do_amber_round=True, do_red_round=True)
Copy-number correction:
import crispy as cy
import matplotlib.pyplot as plt
from crispy.CRISPRData import ReadCounts, Library
"""
Import sample data
"""
rawcounts, copynumber = cy.Utils.get_example_data()
"""
Import CRISPR-Cas9 library
Important:
Library has to have the following columns: "Chr", "Start", "End", "Approved_Symbol"
Library and segments have to have consistent "Chr" formating: "Chr1" or "chr1" or "1"
Gurantee that "Start" and "End" columns are int
"""
lib = Library.load_library("Yusa_v1.1.csv.gz")
lib = lib.rename(
columns=dict(start="Start", end="End", chr="Chr", Gene="Approved_Symbol")
).dropna(subset=["Chr", "Start", "End"])
lib["Chr"] = "chr" + lib["Chr"]
lib["Start"] = lib["Start"].astype(int)
lib["End"] = lib["End"].astype(int)
"""
Calculate fold-change
"""
plasmids = ["ERS717283"]
rawcounts = ReadCounts(rawcounts).remove_low_counts(plasmids)
sgrna_fc = rawcounts.norm_rpm().foldchange(plasmids)
"""
Correct CRISPR-Cas9 sgRNA fold changes
"""
crispy = cy.Crispy(
sgrna_fc=sgrna_fc.mean(1), copy_number=copynumber, library=lib.loc[sgrna_fc.index]
)
bed_df = crispy.correct(n_sgrna=10)
print(bed_df.head())
crispy.gpr.plot(x_feature="ratio", y_feature="fold_change")
plt.show()
Credits and License
Developed at the Wellcome Sanger Institue (2017-2020).
For citation please refer to:
Gonçalves E, Behan FM, Louzada S, Arnol D, Stronach EA, Yang F, Yusa K, Stegle O, Iorio F, Garnett MJ (2019) Structural
rearrangements generate cell-specific, gene-independent CRISPR-Cas9 loss of fitness effects. Genome Biol 20: 27
Gonçalves E, Thomas M, Behan FM, Picco G, Pacini C, Allen F, Parry-Smith D, Iorio F, Parts L, Yusa K, Garnett MJ (2019)
Minimal genome-wide human CRISPR-Cas9 library. bioRxiv