Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

promptsmiles

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

promptsmiles

A conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.

1.4.2
PyPI

Maintainers: 1

PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models

This library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.

Installation

The libary can be installed via pip

pip install promptsmiles

Or via obtaining a copy of this repo, promptsmiles requires RDKit.

git clone https://github.com/compsciencelab/PromptSMILES.git
cd PromptSMILES
pip install ./

Use

PromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.

Scaffold Decoration

from promptsmiles import ScaffoldDecorator, FragmentLinker

SD = ScaffoldDecorator(
    scaffold="N1(*)CCN(CC1)CCCCN(*)",
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False, # CLM.sampler accepts a list of prompts or not
    optimize_prompts=True,
    shuffle=True, # Randomly select attachment points within a batch or not
    return_all=False,
    )
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired

alt text

Fragment linking / scaffold hopping

FL = FragmentLinker(
    fragments=["N1(*)CCNCC1", "C1CC1(*)"],
    batch_size=64,
    sample_fn=CLM.sampler,
    evaluate_fn=CLM.evaluater,
    batch_prompts=False,
    optimize_prompts=True,
    shuffle=True,
    scan=False, # Optional when combining 2 fragments, otherwise is set to true
    return_all=False,
)
smiles = FL.sample(batch_size=3)

alt text

Required chemical language model functions

Notice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.

def CLM_sampler(prompt: Union[str, list[str]], batch_size: int):
    """
    Input: Must have a prompt and batch_size argument.
    Output: SMILES [list]
    """
    # Encode prompt and sample as per model implementation
    return smiles

Note: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and batch_prompts should be set to True in the promptsmiles class used.

The second is a function that evaluates the NLL of a list of SMILES

def CLM_evaluater(smiles: list[str]):
    """
    Input: A list of SMILES
    Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)
    """
    return nlls

FAQs

What is promptsmiles?

Is promptsmiles well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

promptsmiles

PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models

Installation

Use

Scaffold Decoration

Fragment linking / scaffold hopping

Required chemical language model functions

Related posts

Malicious npm Campaign Targets Ethereum Developers with Fake Hardhat Packages

Quasar RAT Disguised as an npm Package for Detecting Vulnerabilities in Ethereum Smart Contracts