Security News
PyPI’s New Archival Feature Closes a Major Security Gap
PyPI now allows maintainers to archive projects, improving security and helping users make informed decisions about their dependencies.
This is a Python package for 3D cell shape features and classes using deep learning. Please refer to our preprint here.
cellshape is the main package which imports from sub-packages:
The software requires Python 3.7 or greater. The following are package dependencies that are installed automatically when cellshape is installed: PyTorch
, pyntcloud
, numpy
, scikit-learn
, tensorboard
, tqdm
(The full list is shown in the setup.py file). This repo makes extensive use of cellshape-cloud
, cellshape-cluster
, cellshape-helper
, and cellshape-voxel
. To reproduce our results in our paper, only cellshape-cloud
, cellshape-cluster
are needed.
conda create --name cellshape-env python=3.8 -y
conda activate cellshape-env
pip install --upgrade pip
pip install cellshape
This should take ~5mins or less.
We have tested this software on an Ubuntu 20.04LTS and 18.04LTS with 128Gb RAM and NVIDIA Quadro RTX 6000 GPU.
Datasets to reproduce our results in our paper are available here.
cellshape-helper
to test our point cloud generation code.We suggest testing our code on the data contained in SamplePointCloudData.zip
. This data is structured in the following way:
cellshapeSamplePointCloudDatset/
small_data.csv
Plate1/
stacked_pointcloud/
Binimetinib/
0010_0120_accelerator_20210315_bakal01_erk_main_21-03-15_12-37-27.ply
...
Blebbistatin/
...
Plate2/
stacked_pointcloud/
Plate3/
stacked_pointcloud/
This data structure is only necessary if wanting to use our data. If you would like to use your own dataset, you may structure it in any way as long as the extension of the point clouds are .ply
. If using your own data structure, please define the parameter --dataset_type
as "Other"
.
The following steps assume that one already has point cloud representations of cells or nuclei. If you need to generate point clouds from 3D binary masks, please go to cellshape-helper
.
We suggest testing our code on the data contained in SamplePointCloudData.zip
. Please download the data and unzip the contents into a directory of your choice. We recommend doing this in your ~Documents/
folder. This is used as parameters in the steps below, so please remember where you download the data to. Downloading and unzipping the data can be done in the terminal. You might need to first install wget
and unzip
with apt-get
(e.g. apt-get install wget
).
~/Documents/
folder with wgetcd ~/Documents
wget https://sandbox.zenodo.org/record/1080300/files/SamplePointCloudDataset.zip
unzip SamplePointCloudDataset.zip
This will create a directory called cellshapeSamplePointCloudDatset
under your ~Documents/
folder, i.e. /home/USER/Documents/cellshapeSamplePointCloudDatset/
(USER
will be different for you).
The training procedure follows two steps:
Inference can be done after each step.
Our training functions are run through a command line interface with the command cellshape-train
.
For help on all command line options, run the following in the terminal:
cellshape-train -h
The first step trains the autoencoder without the additional clustering layer. Run the following in the terminal. Remember to change the --cloud_dataset_path
, --dataframe_path
, and --output_dir
parmaeters to be specific to your directories, if you have saved the data somewhere else. To test the code, we train for 5 epochs. First make sure you're in the directory where you downloaded the data to. If this is your `~/Documents/ folder, go into this:
cd ~/Documents
Then run the following:
cellshape-train \
--model_type "cloud" \
--pretrain "True" \
--train_type "pretrain" \
--cloud_dataset_path "./cellshapeSamplePointCloudDataset/" \
--dataset_type "SingleCell" \
--dataframe_path "./cellshapeSamplePointCloudDataset/small_data.csv" \
--output_dir "./cellshapeOutput/" \
--num_epochs_autoencoder 5 \
--encoder_type "dgcnn" \
--decoder_type "foldingnetbasic" \
--num_features 128 \
This step will create an output directory /home/USER/Documents/cellshapeOutput/
with the subfolders: nets
, reports
, and runs
which contain the model weights, logged outputs, and tensorboard runs, respectively, for each experiment. Each experiment is named with the following convention {encoder_type}_{decoder_type}_{num_features}_{train_type}_{xxx}
, where {xxx} is a counter. For example, if this was the first experiment you have run, the trained model weights will be saved to: /home/USER/Documents/cellshapeOutput/nets/dgcnn_foldingnetbasic_128_pretrained_001.pt
. This path will be used in the next step for the --pretrained-path
parameter.
The next step is to add the clustering layer to refine the model weights. As before, run the following in the terminal. Remember to change the --cloud_dataset_path
, --dataframe_path
, --output_dir
, and --pretrained-path
parmaeters to be specific to your directories. If you have followed the previous steps, then you will still be in the ~Documents/
path. In the same terminal, run:
cellshape-train \
--model_type "cloud" \
--train_type "DEC" \
--pretrain False \
--cloud_dataset_path "./cellshapeSamplePointCloudDataset/" \
--dataset_type "SingleCell" \
--dataframe_path "./cellshapeSamplePointCloudDataset/small_data.csv" \
--output_dir "./cellshapeOutput/" \
--num_features 128 \
--num_clusters 5 \
--pretrained_path "./cellshapeOutput/nets/dgcnn_foldingnetbasic_128_pretrained_001.pt" \
To monitor the training using Tensorboard, in a new terminal run:
pip install tensorboard
cd ~/Documents
tensorboard --logdir "./cellshapeOutput/runs/"
This would be to state that you would like to pretrain
and that you want to train DEC
.
cellshape-train \
--model_type "cloud" \
--train_type "DEC" \
--pretrain True \
--cloud_dataset_path "./cellshapeSamplePointCloudDataset/" \
--dataset_type "SingleCell" \
--dataframe_path "./cellshapeSamplePointCloudDataset/small_data.csv" \
--output_dir "./cellshapeOutput/" \
--num_features 128 \
--num_clusters 5 \
Example inference notebooks can be found in the docs/notebooks/
folder.
If you have any problems, please raise an issue here
@article{DeVries2022single,
author = {Matt De Vries and Lucas Dent and Nathan Curry and Leo Rowe-Brown and Vicky Bousgouni and Adam Tyson and Christopher Dunsby and Chris Bakal},
title = {3D single-cell shape analysis using geometric deep learning},
elocation-id = {2022.06.17.496550},
year = {2023},
doi = {10.1101/2022.06.17.496550},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2023/03/27/2022.06.17.496550},
eprint = {https://www.biorxiv.org/content/early/2023/03/27/2022.06.17.496550.full.pdf},
journal = {bioRxiv}
}
[1] An Tao, 'Unsupervised Point Cloud Reconstruction for Classific Feature Learning', GitHub Repo, 2020
FAQs
3D shape analysis using deep learning
We found that cellshape demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
PyPI now allows maintainers to archive projects, improving security and helping users make informed decisions about their dependencies.
Research
Security News
Malicious npm package postcss-optimizer delivers BeaverTail malware, targeting developer systems; similarities to past campaigns suggest a North Korean connection.
Security News
CISA's KEV data is now on GitHub, offering easier access, API integration, commit history tracking, and automated updates for security teams and researchers.