datamol - molecular processing made easy
Docs
|
Homepage

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.
- 🐍 Simple pythonic API
- ⚗️ RDKit first: all you manipulate are
rdkit.Chem.Mol objects.
- ✅ Manipulating molecules often relies on many options; Datamol provides good defaults by design.
- 🧠 Performance matters: built-in efficient parallelization when possible with an optional progress bar.
- 🕹️ Modern IO: out-of-the-box support for remote paths using
fsspec to read and write multiple formats (sdf, xlsx, csv, etc).
Try Online
Visit
and try Datamol online.
Documentation
Visit https://docs.datamol.io.
Installation
Use conda:
mamba install -c conda-forge datamol
Quick API Tour
import datamol as dm
mol = dm.to_mol("O=C(C)Oc1ccccc1C(=O)O", sanitize=True)
fp = dm.to_fp(mol)
selfies = dm.to_selfies(mol)
inchi = dm.to_inchi(mol)
mol = dm.to_mol("O=C(C)Oc1ccccc1C(=O)O")
mol = dm.fix_mol(mol)
mol = dm.sanitize_mol(mol)
mol = dm.standardize_mol(mol)
df = dm.data.freesolv()
mols = dm.from_df(df)
legends = [dm.to_smiles(mol) for mol in mols[:10]]
dm.viz.to_image(mols[:10], legends=legends)
smiles = "O=C(C)Oc1ccccc1C(=O)O"
mol = dm.to_mol(smiles)
mol_with_conformers = dm.conformers.generate(mol)
dm.viz.conformers(mol, n_confs=10)
sasa = dm.conformers.sasa(mol_with_conformers)
mols = dm.read_sdf("s3://my-awesome-data-lake/smiles.sdf", as_df=False)
dm.to_sdf(mols, "gs://data-bucket/smiles.sdf")
How to cite
Please cite Datamol if you use it in your research:
.
Compatibilities
Version compatibilities are an essential topic for production-software stacks. We are cautious about documenting compatibility between datamol, python and rdkit.
See below the associated versions of Python and RDKit, for which a minor version of Datamol has been tested during its whole lifecycle. It does not mean other combinations does not work but that those are not tested.
0.12.x | [3.10, 3.11] | [2023.03, 2023.09] |
0.11.x | [3.9, 3.10, 3.11] | [2022.09, 2023.03] |
0.10.x | [3.9, 3.10, 3.11] | [2022.03, 2022.09] |
0.9.x | [3.9, 3.10, 3.11] | [2022.03, 2022.09] |
0.8.x | [3.8, 3.9, 3.10] | [2021.09, 2022.03, 2022.09] |
0.7.x | [3.8, 3.9] | [2021.09, 2022.03] |
0.6.x | [3.8, 3.9] | [2021.09] |
0.5.x | [3.8, 3.9] | [2021.03, 2021.09] |
0.4.x | [3.8, 3.9] | [2020.09, 2021.03] |
0.3.x | [3.8, 3.9] | [2020.09, 2021.03] |
CI Status
The CI runs tests and performs code quality checks for the following combinations:
- The three major platforms: Windows, OSX and Linux.
- The two latest Python versions.
- The two latest RDKit versions.
| Lib build & Testing |  |
| Code Sanity (linting and type analysis) |  |
| Documentation Build |  |
License
Under the Apache-2.0 license. See LICENSE.