Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
CPA
is a framework to learn the effects of perturbations at the single-cell level. CPA encodes and learns phenotypic drug responses across different cell types, doses, and combinations. CPA allows:
You can install CPA using pip and also directly from the github to access latest development version. See detailed instructions here.
Several tutorials are available here to get you started with CPA. The following table contains the list of tutorials:
We provide an example script to use the built-in hyperparameter optimization function in CPA (based on scvi-tools hyperparam optimizer). You can find the script at examples/tune_script.py
.
After the hyperparameter optimization using tune_script.py is done, result_grid.pkl
is saved in your current directory using the pickle
library. You can load the results using the following code:
import pickle
with open('result_grid.pkl', 'rb') as f:
result_grid = pickle.load(f)
From here, you can follow the instructions in the Ray Documentations to analyze the run, and choose the best hyperparameters for your data.
You can also use the integration with wandb to log the hyperparameter optimization results. You can find the script at examples/tune_script_wandb.py
. --> use_wandb=True
Everything is based on Ray Tune. You can find more information about the hyperparameter optimization in the Ray Tune Documentations.
The tuner is adapted and adjusted from scvi-tools v1.2.0 (unreleased) release notes
Datasets and pre-trained models are available here.
If you have access to you raw data, you can do the following steps to pre-process your dataset. A raw dataset should be a scanpy object containing raw counts and available required metadata (i.e. perturbation, dosage, etc.).
Check for required information in cell metadata:
a) Perturbation information should be in adata.obs
.
b) Dosage information should be in adata.obs
. In cases like CRISPR gene knockouts, disease states, time perturbations, etc, you can create & add a dummy dosage in your adata.obs
. For example:
adata.obs['dosage'] = adata.obs['perturbation'].astype(str).apply(lambda x: '+'.join(['1.0' for _ in x.split('+')])).values
c) [If available] Cell type information should be in adata.obs
.
d) [Multi-batch integration] Batch information should be in adata.obs
.
Filter out cells with low number of counts (sc.pp.filter_cells
). For example:
sc.pp.filter_cells(adata, min_counts=100)
[optional]
sc.pp.filter_genes(adata, min_counts=5)
Save the raw counts in adata.layers['counts']
.
adata.layers['counts'] = adata.X.copy()
Normalize the counts (sc.pp.normalize_total
).
sc.pp.normalize_total(adata, target_sum=1e4, exclude_highly_expressed=True)
Log transform the normalized counts (sc.pp.log1p
).
sc.pp.log1p(adata)
Highly variable genes selection:
There are two options:
1. Use the sc.pp.highly_variable_genes
function to select highly variable genes.
python sc.pp.highly_variable_genes(adata, n_top_genes=5000, subset=True)
2. (Highly Recommended specially for Multi-batch integration scenarios) Use scIB's highly variable genes selection function to select highly variable genes. This function is more robust to batch effects and can be used to select highly variable genes across multiple datasets.
python import scIB adata_hvg = scIB.pp.hvg_batch(adata, batch_key='batch', n_top_genes=5000, copy=True)
Congrats! Now you're dataset is ready to be used with CPA. Don't forget to save your pre-processed dataset using adata.write_h5ad
function.
If you have a question or new architecture or a model that could be integrated into our pipeline, you can post an issue
If CPA is helpful in your research, please consider citing the Lotfollahi et al. 2023
@article{lotfollahi2023predicting,
title={Predicting cellular responses to complex perturbations in high-throughput screens},
author={Lotfollahi, Mohammad and Klimovskaia Susmelj, Anna and De Donno, Carlo and Hetzel, Leon and Ji, Yuge and Ibarra, Ignacio L and Srivatsan, Sanjay R and Naghipourfar, Mohsen and Daza, Riza M and
Martin, Beth and others},
journal={Molecular Systems Biology},
pages={e11517},
year={2023}
}
FAQs
Compositional Perturbation Autoencoder (CPA)
We found that cpa-tools demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.