Product
Introducing SSO
Streamline your login process and enhance security by enabling Single Sign-On (SSO) on the Socket platform, now available for all customers on the Enterprise plan, supporting 20+ identity providers.
A conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.
Readme
This library contains code to manipulate SMILES strings to facilitate iterative prompting to be coupled with a trained chemical language model (CLM) that uses SMILES notation.
The libary can be installed via pip
pip install promptsmiles
Or via obtaining a copy of this repo, promptsmiles requires RDKit.
git clone https://github.com/compsciencelab/PromptSMILES.git
cd PromptSMILES
pip install ./
PromptSMILES is designed as a wrapper to CLM sampling that can accept a prompt (i.e., an initial string to begin autoregressive token generation). Therefore, it requires two callable functions, described later. PromptSMILES has 3 main classes, DeNovo (a dummy wrapper to make code consistent), ScaffoldDecorator, and FragmentLinker.
from promptsmiles import ScaffoldDecorator, FragmentLinker
SD = ScaffoldDecorator(
scaffold="N1(*)CCN(CC1)CCCCN(*)",
batch_size=64,
sample_fn=CLM.sampler,
evaluate_fn=CLM.evaluater,
batch_prompts=False, # CLM.sampler accepts a list of prompts or not
optimize_prompts=True,
shuffle=True, # Randomly select attachment points within a batch or not
return_all=False,
)
smiles = SD.sample(batch_size=3, return_all=True) # Parameters can be overriden here if desired
FL = FragmentLinker(
fragments=["N1(*)CCNCC1", "C1CC1(*)"],
batch_size=64,
sample_fn=CLM.sampler,
evaluate_fn=CLM.evaluater,
batch_prompts=False,
optimize_prompts=True,
shuffle=True,
scan=False, # Optional when combining 2 fragments, otherwise is set to true
return_all=False,
)
smiles = FL.sample(batch_size=3)
Notice the callable functions required CLM.sampler and CLM.evaluater. The first is a function that samples from the CLM given a prompt.
def CLM_sampler(prompt: Union[str, list[str]], batch_size: int):
"""
Input: Must have a prompt and batch_size argument.
Output: SMILES [list]
"""
# Encode prompt and sample as per model implementation
return smiles
Note: For a more efficient implementation, prompt should accept a list of prompts equal to batch_size and batch_prompts
should be set to True
in the promptsmiles class used.
The second is a function that evaluates the NLL of a list of SMILES
def CLM_evaluater(smiles: list[str]):
"""
Input: A list of SMILES
Output: NLLs [list, np.array, torch.tensor](CPU w.o. gradient)
"""
return nlls
FAQs
A conveniant package to manipulate SMILES strings for iterative prompting with chemical language models.
We found that promptsmiles demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Streamline your login process and enhance security by enabling Single Sign-On (SSO) on the Socket platform, now available for all customers on the Enterprise plan, supporting 20+ identity providers.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.