Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
mol_eval
)mol_eval
is a tool for evaluating SMILES data, particularly for distinguishing between real and fake SMILES sequences. It uses configurable thresholds and molecular descriptors to assess similarity and other properties such as solubility.
To install mol_eval
, you can use pip
:
pip install mol_eval
Before running the tool, you'll need to prepare your dataset and configuration file.
real_data.csv: This file should contain two columns:
cmpd_name: The name of the compound.
smile: The SMILES string representing the molecule.
fake_data.csv: This file should contain one column:
smile: The SMILES string of synthetic molecules.
The configuration file allows you to set various thresholds and other parameters used in the evaluation. Here's an example configuration file:
{
"LEVENSHTEIN_THRESHOLD": 0.5,
"VERY_HIGH_SIMILARITY_THRESHOLD": 0.9,
"HIGH_SIMILARITY_THRESHOLD": 0.88,
"LOW_SIMILARITY_THRESHOLD": 0.3,
"SOLUBILITY_THRESHOLDS": {
"VERY_HIGH": -1,
"HIGH": 0,
"MODERATE": 2,
"LOW": 4,
"VERY_LOW": "Infinity"
},
"RELEVANT_DESCRIPTORS": [
"MolWt", "MolLogP", "TPSA"
],
"TANIMOTO_THRESHOLDS": {
"VERY_HIGH": 0.9,
"HIGH": 0.88,
"MODERATE": 0.3
},
"VALID_SOLUBILITY_LABELS": ["VERY_HIGH", "HIGH", "MODERATE"],
"VALID_TANIMOTO_LABELS": ["HIGH", "MODERATE", "LOW"],
"MAX_SUBSTRUCTURES_MATCHES": 0,
"REPORT_FOLDER": "./report"
}
Thresholds
: Customize similarity and solubility thresholds for better evaluation.Descriptors
: Choose molecular descriptors for evaluation, such as molecular weight (MolWt), logP (MolLogP), and polar surface area (TPSA).Tanimoto
and Levenshtein
: Fine-tune the thresholds for calculating molecular similarity.Solubility
Labels: Define the solubility categories based on the solubility values.Report Folder
: Define where to save evaluation reports.After installing the package and preparing your dataset and configuration file, you can run the evaluation tool via the command line. Run the Evaluation
Use the following command to evaluate your datasets:
mol_eval --real_data /path/to/real_data.csv --fake_data /path/to/fake_data.csv --configs /path/to/config.json
usage: mol_eval [-h] --real_data REAL_DATA --fake_data FAKE_DATA --configs CONFIGS
Molecule Evaluator: Evaluate real and fake SMILES data using a configuration file.
options:
-h, --help Show this help message and exit.
--real_data REAL_DATA Path to the real SMILES data file (CSV).
--fake_data FAKE_DATA Path to the fake SMILES data file (CSV).
--configs CONFIGS Path to the configuration JSON file.
The tool generates a report in the folder specified by REPORT_FOLDER in the configuration file (default is ./report). The report contains detailed information on the evaluation of the SMILES sequences, including similarity metrics, solubility predictions, and substructure matching.
Contributions are welcome! Feel free to open issues or submit pull requests. Please ensure all tests pass and that the code follows the PEP 8 style guide.
This project is licensed under the terms of the GNU General Public License, Version 3.
FAQs
A tool for the evaluation of molecules smiles
We found that mol-eval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.