Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Can I buy this molecule? Returns results in about 500 ns and consumes about 100MB of RAM (or 2 GB if using all ZINC20).
pip install molbloom
from molbloom import buy
buy('CCCO')
# True
buy('ONN1CCCC1')
# False
If buy
returns True
- it may be purchasable with a measured error rate of 0.0003. If it returns False
- it is not purchasable.
The catalog information is from ZINC20. Add canonicalize=True
if your SMILES are not canonicalized (requires installing rdkit).
There are other available catalogs - see options with molbloom.catalogs()
. Most catalogs require an initial download. buy('CCCO', catalog='zinc-instock-mini)
doesn't require a download and is included in the package. Useful for testing, but has a high false positive rate of 1%.
Just because buy
returns True
doesn't mean you can buy it -- you should follow-up with a real query at ZINC or you can use the search feature in SmallWorld
to find similar purchasable molecules.
from smallworld_api import SmallWorld
sw = SmallWorld()
aspirin = 'O=C(C)Oc1ccccc1C(=O)O'
results = sw.search(aspirin, dist=5, db=sw.REAL_dataset)
this will query ZINC Small World.
Do you have your own list of SMILES? There are two ways to build a filter -- you can use a C tool that is very fast (1M / s) if your SMILES are in a file and already canonical. Or you can use the Python API to programmaticaly build a filter and canonicalize as you go. See below
Once your custom filter is built:
from molbloom import BloomFilter
bf = BloomFilter('myfilter.bloom')
# usage:
'CCCO' in bf
You can build your own filter using the code in the tool/
directory.
cd tool
make
./molbloom-bloom <MB of final filter> <filter name> <approx number of compounds> <input file 1> <input file 2> ...
where each input file has SMILES on each line in the first column and is already canonicalized. The higher the MB, the lower the rate of false positives. If you want to choose the false positive rate rather than the size, you can use the equation:
$$ M = - \frac{N \ln \epsilon}{(\ln 2)^2} $$
where $M$ is the size in bits, $N$ is the number of compounds, and $\epsilon$ is the false positive rate.
You can also build a filter using python as follows:
from molbloom import CustomFilter, canon
bf = CustomFilter(100, 1000, 'myfilter')
bf.add('CCCO')
# canonicalize one record
s = canon("CCCOC")
bf.add(s)
# finalize filter into a file
bf.save('test.bloom')
@software{White_molbloom_quick_assessment_2022,
author = {White, Andrew D},
doi = {10.5281/zenodo.7426402},
month = {12},
title = {{molbloom: quick assessment of compound purchasability with bloom filters}},
url = {https://github.com/whitead/molbloom},
version = {2.0.0},
year = {2022}
}
FAQs
Purchaseable SMILES filter
We found that molbloom demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.