Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
A parameter-efficient molecular featuriser that generalises well to biological tasks thanks to the effective pre-training on biological and quantum mechnical datasets.
The model has been introduced in the paper 𝙼𝚒𝚗𝚒𝙼𝚘𝚕: A Parameter-Efficient Foundation Model for Molecular Learning, published in the ICML workshop on Accessible and Efficient Foundation Models for Biological Discovery in 2024.
Embeddings can be generated in four lines of code:
from minimol import Minimol
model = Minimol()
smiles = [
'COc1ccc2cc(C(=O)NC3(C(=O)N[C@H](Cc4ccccc4)C(=O)NCC4CCN(CC5CCOCC5)CC4)CCCC3)sc2c1',
'Nc1nc(=O)c2c([nH]1)NCC(CNc1ccc(C(=O)NC(CCC(=O)O)C(=O)O)cc1)N2C=O',
'O=C1CCCN1CCCCN1CCN(c2cc(C(F)(F)F)ccn2)CC1',
'c1ccc(-c2cccnc2)cc1',
]
model(smiles)
>> A list of 4 tensors of (512,) shape
For a Colab notebook showing how to use Minimol's fingerprints to achieve SoTA results on a downstream task, click here:
When used with cuda, use nvcc --version
to see which version of the driver is installed on your machine, to select the wheel (cuXXX):
pip install torch-sparse torch-cluster torch-scatter -f https://pytorch-geometric.com/whl/torch-2.3.0+cu124.html
pip install minimol
git clone git@github.com:graphcore-research/minimol.git
cd minimol
mamba env create -f env.yml -n minimol_venv
mamba activate minimol
To install mamba see the official documentation.
The model has been evaluated on 22 benchmarks from the ADMET group of Therapeutics Data Commons (TDC). These are the results when comparing to MolE and TOP5 models from the TDC leaderboard (as of June 2024):
TDC Dataset | TDC Leaderboard | MolE | MiniMol (GINE) | ||||
---|---|---|---|---|---|---|---|
Name | Size | Metric | SoTA Result | Result | Rank | Result | Rank |
Absorption | |||||||
Caco2 Wang | 906 | MAE | 0.276 ± 0.005 | 0.310 ± 0.010 | 6 | 0.350 ± 0.018 | 7 |
Bioavailability Ma | 640 | AUROC | 0.748 ± 0.033 | 0.654 ± 0.028 | 7 | 0.689 ± 0.020 | 5 |
Lipophilicity AZ | 4,200 | MAE | 0.467 ± 0.006 | 0.469 ± 0.009 | 3 | 0.456 ± 0.008 | 1 |
Solubility AqSolDB | 9,982 | MAE | 0.761 ± 0.025 | 0.792 ± 0.005 | 5 | 0.741 ± 0.013 | 1 |
HIA Hou | 578 | AUROC | 0.989 ± 0.001 | 0.963 ± 0.019 | 7 | 0.993 ± 0.005 | 1 |
Pgp Broccatelli | 1,212 | AUROC | 0.938 ± 0.002 | 0.915 ± 0.005 | 7 | 0.942 ± 0.002 | 1 |
Distribution | |||||||
BBB Martins | 1,975 | AUROC | 0.916 ± 0.001 | 0.903 ± 0.005 | 7 | 0.924 ± 0.003 | 1 |
PPBR AZ | 1,797 | MAE | 7.526 ± 0.106 | 8.073 ± 0.335 | 6 | 7.696 ± 0.125 | 4 |
VDss Lombardo | 1,130 | Spearman | 0.713 ± 0.007 | 0.654 ± 0.031 | 3 | 0.535 ± 0.027 | 7 |
Metabolism | |||||||
CYP2C9 Veith | 12,092 | AUPRC | 0.859 ± 0.001 | 0.801 ± 0.003 | 5 | 0.823 ± 0.006 | 4 |
CYP2D6 Veith | 13,130 | AUPRC | 0.790 ± 0.001 | 0.682 ± 0.008 | 6 | 0.719 ± 0.004 | 5 |
CYP3A4 Veith | 12,328 | AUPRC | 0.916 ± 0.000 | 0.867 ± 0.003 | 7 | 0.877 ± 0.001 | 4 |
CYP2C9 Substrate | 666 | AUPRC | 0.441 ± 0.033 | 0.446 ± 0.062 | 2 | 0.474 ± 0.025 | 1 |
CYP2D6 Substrate | 664 | AUPRC | 0.736 ± 0.024 | 0.699 ± 0.018 | 7 | 0.695 ± 0.032 | 6 |
CYP3A4 Substrate | 667 | AUROC | 0.662 ± 0.031 | 0.670 ± 0.018 | 1 | 0.663 ± 0.008 | 2 |
Excretion | |||||||
Half Life Obach | 667 | Spearman | 0.562 ± 0.008 | 0.549 ± 0.024 | 4 | 0.495 ± 0.042 | 6 |
Clearance Hepatocyte | 1,102 | Spearman | 0.498 ± 0.009 | 0.381 ± 0.038 | 7 | 0.446 ± 0.029 | 3 |
Clearance Microsome | 1,020 | Spearman | 0.630 ± 0.010 | 0.607 ± 0.027 | 6 | 0.628 ± 0.005 | 2 |
Toxicity | |||||||
LD50 Zhu | 7,385 | MAE | 0.552 ± 0.009 | 0.823 ± 0.019 | 7 | 0.585 ± 0.005 | 2 |
hERG | 648 | AUROC | 0.880 ± 0.002 | 0.813 ± 0.009 | 7 | 0.846 ± 0.016 | 4 |
Ames | 7,255 | AUROC | 0.871 ± 0.002 | 0.883 ± 0.005 | 1 | 0.849 ± 0.004 | 5 |
DILI | 475 | AUROC | 0.925 ± 0.005 | 0.577 ± 0.021 | 7 | 0.956 ± 0.006 | 1 |
Mean Rank: | 5.2 | 3.3 |
Copyright (c) 2024 Graphcore Ltd. Licensed under the MIT License.
The included code is released under the MIT license (see details of the license).
FAQs
Molecular fingerprinting using pre-trained deep nets
We found that minimol demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.