Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Map4 is a MinHash-based molecular fingerprint.
As usual, you can simply install the package using pip:
pip install map4
Given a SMILES string, you can generate the MAP4 fingerprint as follows:
from rdkit.Chem import MolFromSmiles, Mol # pylint: disable=import-error,no-name-in-module
import numpy as np
from map4 import MAP4
map4 = MAP4(
# The size of the MinHash-based fingerprint
dimensions=2048,
# The radius of the circular substructures to consider
radius=2,
# Whether to include duplicated shingles, which we can
# make unique by extending them with a counter
include_duplicated_shingles=False,
)
molecule: Mol = MolFromSmiles("CCO")
fingerprint: np.ndarray = map4.calculate(molecule)
assert fingerprint.shape == (2048,)
Map4 also provides a multiprocessing-based implementation to calculate the fingerprints of a list of molecules:
from typing import List
import numpy as np
from rdkit.Chem import Mol, MolFromSmiles # pylint: disable=import-error,no-name-in-module
from map4 import MAP4
map4 = MAP4(
dimensions=2048,
radius=2,
include_duplicated_shingles=False,
)
molecules: List[Mol] = [MolFromSmiles("CCO"), MolFromSmiles("CCN")]
fingerprints: np.ndarray = map4.calculate_many(
molecules,
# The number of threads to use
number_of_threads=2,
# Whether to show a progress bar
verbose=True,
)
assert len(fingerprints) == 2
assert fingerprints[0].shape == (2048,)
assert fingerprints[1].shape == (2048,)
Finally, the fingerprints can be visualized using the visualize
method, which computes a TSNE of the fingerprints of the provided molecules.
You can find an example of how to use the visualize
method in the test_visualize.py
file. Here's a preview:
Map4 also provides a command-line entry-point called map4
. This command-line interface (CLI) provides a way to compute MAP4 fingerprints for a batch of molecules using SMILES input. The fingerprints can be customized via various options such as fingerprint dimensions, radius, and batch size. The entry-point is available once the package is installed, so no additional setup is required.
map4 --input-path <input_file> --output-path <output_file> [options]
--input-path, -i
: Path to the input file containing molecules in SMILES format.--output-path, -o
: Path to the output file where the fingerprints will be saved.--dimensions, -d
: Number of dimensions for the MinHashed fingerprint. Choices: [128, 512, 1024, 2048]
. Default: 1024
.--radius, -r
: Radius of the fingerprint. Default: 2
.--include-duplicated-shingles
: Whether to include duplicated shingles in the fingerprint. Default: False
.--clean-mols
: Whether to clean and canonicalize the molecules before fingerprint calculation. Default: True
.--delimiter
: Delimiter used in both input and output files. Default: \t
.--fp-delimiter
: Delimiter used between the numbers in the fingerprint output. Default: ;
.--batch-size, -b
: Number of molecules to process in a batch. Default: 500
.map4 -i molecules.smi -o fingerprints.txt -d 1024 -r 2 --clean-mols True --batch-size 1000
This command processes molecules from molecules.smi
, computes 1024-dimensional MAP4 fingerprints, and outputs them to fingerprints.txt
.
Folder description:
Extended-Benchmark
: compounds and query lists used for the peptide benchmarkMAP4-Similarity-Search
: source code for the similarity search appmap4
: MAP4 fingerprint source codeThe canonical, not isomeric, and rooted SMILES of the circular substructures CS
from radius one up to a user-given radius n
(default n=2
, MAP4
) are generated for each atom. All atom pairs are extracted, and their minimum topological distance TP
is calculated. For each atom pair jk
, for each considered radius r
, a Shingle
is encoded as: CS
rj
|TP
jk
|CS
rk
, where the two CS
are annotated in alphabetical order, resulting in n Shingles for each atom pairs.
The resulting list of Shingles is hashed using the unique mapping SHA-1
to a set of integers S
i
, and its correspondent transposed vector s
T
i
is MinHashed.
Draw a structure or paste its SMILES, or write a natural peptides linear sequence. Search for its analogs in the MAP4 or MHFP6 space of ChEMBL, of the Human Metabolome Database (HMDB), or of the 'below 50 residues subset' of SwissProt.
The MAP4 search can be found at: http://map-search.gdb.tools/.
The code of the MAP4 similarity search can be found in this repository folder MAP4-Similarity-Search
To run the app locally:
docker run -p 8080:5000 --mount type=bind,target=/MAP4SearchData,source=/your/absolut/path/MAP4SearchData --restart always --name mapsearch alicecapecchi/map-search:latest
Compounds and training list used to extend the Riniker et. al. fingerprint benchmark (Riniker, G. Landrum, J. Cheminf., 5, 26 (2013), DOI: 10.1186/1758-2946-5-26, URL: http://www.jcheminf.com/content/5/1/26, GitHub page: https://github.com/rdkit/benchmarking_platform) to peptides.
FAQs
MinHashed AtomPair Fingerprint of Radius 2
We found that map4 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.