You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

misato-dataset

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

misato-dataset

UNOFFICIAL Misato dataset pypi package. For instructions on dataset download see official GitHub page (https://github.com/t7morgen/misato-dataset).

0.0.7
pipPyPI
Maintainers
1

MISATO - Machine learning dataset of protein-ligand complexes for structure-based drug discovery

python pytorch lightning

:earth_americas: Where we are:

  • Quantum Mechanics: 19443 ligands, curated and refined
  • Molecular Dynamics: 16972 simulated protein-ligand structures, 10 ns each
  • AI: pytorch dataloaders, 3 base line models for MD and QM and binding affinity prediction

:electron: Vision:

We are a drug discovery community project :hugs:

  • highest possible accuracy for ligand molecules
  • represent the systems dynamics in reasonable timescales
  • innovative AI models for drug discovery predictions

Lets crack the 100+ ns MD, 30000+ protein-ligand structures and a whole new world of AI models for drug discovery together.

Check out the paper!

Alt text

:purple_heart: Community

Want to get hands-on for drug discovery using AI?

Join our discord server!

Check out our Hugging Face spaces to run and visualize the adaptability model and to perform QM property predictions.

📌  Introduction

In this repository, we show how to download and apply the Misato database for AI models. You can access the calculated properties of different protein-ligand structures and use them for training in Pytorch based dataloaders. We provide a small sample of the dataset along with the repo.

You can freely download the FULL MISATO dataset from Zenodo using the links below:

  • MD (133 GiB)
  • QM (0.3 GiB)
  • electronic densities (6 GiB)
  • MD restart and topology files (55 GiB)
wget -O data/MD/h5_files/MD.hdf5 https://zenodo.org/record/7711953/files/MD.hdf5
wget -O data/QM/h5_files/QM.hdf5 https://zenodo.org/record/7711953/files/QM.hdf5

Start with the notebook src/getting_started.ipynb to :

  • Understand the structure of our dataset and how to access each molecule's properties.
  • Load the PyTorch Dataloaders of each dataset.
  • Load the PyTorch lightning Datamodules of each dataset.

🚀  Quickstart

We recommend to pull our MISATO image from DockerHub or to create your own image (see docker/). The images use cuda version 11.8. We recommend to install on your own system a version of CUDA that is a least 11.8 to ensure that the drivers work correctly.

# clone project
git clone https://github.com/t7morgen/misato-dataset.git
cd misato-dataset

For singularity use:

# get the container image
singularity pull docker://sab148/misato-dataset
singularity shell misato.sif

For docker use:

sudo docker pull sab148/misato-dataset:latest
bash docker/run_bash_in_container.sh

Project Structure

├── data                   <- Project data
│   ├──MD 
│   │   ├── h5_files           <- storage of dataset
│   │   └── splits             <- train, val, test splits
│   └──QM
│   │   ├── h5_files           <- storage of dataset
│   │   └── splits             <- train, val, test splits
│
├── src                    <- Source code
│   ├── data                    
│   │   ├── components           <- Datasets and transforms
│   │   ├── md_datamodule.py     <- MD Lightning data module
│   │   ├── qm_datamodule.py     <- QM Lightning data module
│   │   │
│   │   └── processing           <- Skripts for preprocessing, inference and conversion
│   │      ├──...    
│   ├── getting_started.ipynb     <- notebook : how to load data and interact with it
│   └── inference.ipynb           <- notebook how to run inference
│
├── docker                    <- Dockerfile and execution script 
└── README.md



Installation using your own conda environment

In case you want to use conda for your own installation please create a new misato environment.

In order to install pytorch geometric we recommend to use pip (within conda) for installation and to follow the official installation instructions:pytorch-geometric/install

Depending on your CUDA version the instructions vary. We show an example for the CUDA 11.8.

conda create --name misato python=3
conda activate misato
conda install -c anaconda pandas pip h5py
pip3 install torch --index-url https://download.pytorch.org/whl/cu118 --no-cache
pip install joblib matplotlib
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install pytorch-lightning==1.8.3
pip install torch-geometric
pip install ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
conda install -c conda-forge nb_conda_kernels

To run inference for MD you have to install ambertools. We recommend to install it in a separate conda environment.

conda create --name ambertools python=3
conda activate ambertools
conda install -c conda-forge ambertools nb_conda_kernels
pip install h5py jupyter ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7

Citation

If you found this work useful please consider citing the article.

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts