Latest Threat Research:SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains.Details
Socket
Book a DemoInstallSign in
Socket

selfies

Package Overview
Dependencies
Maintainers
1
Versions
16
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

selfies - npm Package Compare versions

Comparing version
1.0.3
to
1.0.4
+24
-5
PKG-INFO
Metadata-Version: 2.1
Name: selfies
Version: 1.0.3
Version: 1.0.4
Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs.

@@ -23,3 +23,5 @@ Home-page: https://github.com/aspuru-guzik-group/selfies

[*Machine Learning: Science and Technology* **1**, 045024 (2020)](https://iopscience.iop.org/article/10.1088/2632-2153/aba947), [extensive blog post January 2021](https://aspuru.substack.com/p/molecular-graph-representations-and).<br>
Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_
[Talk on youtube about SELFIES](https://www.youtube.com/watch?v=CaIyUmfGXDk).<br>
Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_<br>
Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)

@@ -144,8 +146,24 @@ A main objective is to use SELFIES as direct input into machine learning models,<br>

* More examples can be found in the ``examples/`` directory, including a
variational autoencoder that runs on the SELFIES language.
[variational autoencoder that runs on the SELFIES](https://github.com/aspuru-guzik-group/selfies/tree/master/examples/vae_example) language.
* This [ICLR2020 paper](https://arxiv.org/abs/1909.11655) used SELFIES in a
genetic algorithm to achieve state-of-the-art performance for inverse design,
with the [code here](https://github.com/aspuru-guzik-group/GA).
* SELFIES allows for [highly efficient exploration and interpolation of the chemical space](https://chemrxiv.org/articles/preprint/Beyond_Generative_Models_Superfast_Traversal_Optimization_Novelty_Exploration_and_Discovery_STONED_Algorithm_for_Molecules_using_SELFIES/13383266), with a [deterministic algorithms, see code](https://github.com/aspuru-guzik-group/stoned-selfies).
* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
## Handling invalid inputs
If an invalid input is presented to the encoder or decoder, the return value is `None`.
The error can be analysed by using the `encoder(...,print_error=True)` option.
```python
import selfies as sf
invalid_smiles="C[C@H](O)[C@@(*)C1=CC=CC=C1"
selfies_string=sf.encoder(invalid_smiles)
if selfies_string==None:
selfies_string=sf.encoder(invalid_smiles,print_error=True)
# 'Encoding error 'C[C@H](O)[C@@(*)C1=CC=CC=C1': wildcard atom '*' not supported.'
```
## Tests

@@ -190,4 +208,5 @@ SELFIES uses `pytest` with `tox` as its testing framework.

We thank Jacques Boitreaud, Andrew Brereton, Matthew Carbone (x94carbone), Nathan Frey (ncfrey), Theophile Gaudin,
Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Kevin Ryan (LeanAndMean), Benjamin Sanchez-Lengeling,
and Zhenpeng Yao for their suggestions and bug reports, and Robert Pollice for chemistry advices.
HelloJocelynLu, Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Alexander Minidis (DocMinus), Kevin Ryan (LeanAndMean),
Benjamin Sanchez-Lengeling, and Zhenpeng Yao for their suggestions and bug reports,
and Robert Pollice for chemistry advices.

@@ -194,0 +213,0 @@ ## License

@@ -15,3 +15,5 @@ # SELFIES

[*Machine Learning: Science and Technology* **1**, 045024 (2020)](https://iopscience.iop.org/article/10.1088/2632-2153/aba947), [extensive blog post January 2021](https://aspuru.substack.com/p/molecular-graph-representations-and).<br>
Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_
[Talk on youtube about SELFIES](https://www.youtube.com/watch?v=CaIyUmfGXDk).<br>
Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_<br>
Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)

@@ -136,8 +138,24 @@ A main objective is to use SELFIES as direct input into machine learning models,<br>

* More examples can be found in the ``examples/`` directory, including a
variational autoencoder that runs on the SELFIES language.
[variational autoencoder that runs on the SELFIES](https://github.com/aspuru-guzik-group/selfies/tree/master/examples/vae_example) language.
* This [ICLR2020 paper](https://arxiv.org/abs/1909.11655) used SELFIES in a
genetic algorithm to achieve state-of-the-art performance for inverse design,
with the [code here](https://github.com/aspuru-guzik-group/GA).
* SELFIES allows for [highly efficient exploration and interpolation of the chemical space](https://chemrxiv.org/articles/preprint/Beyond_Generative_Models_Superfast_Traversal_Optimization_Novelty_Exploration_and_Discovery_STONED_Algorithm_for_Molecules_using_SELFIES/13383266), with a [deterministic algorithms, see code](https://github.com/aspuru-guzik-group/stoned-selfies).
* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
## Handling invalid inputs
If an invalid input is presented to the encoder or decoder, the return value is `None`.
The error can be analysed by using the `encoder(...,print_error=True)` option.
```python
import selfies as sf
invalid_smiles="C[C@H](O)[C@@(*)C1=CC=CC=C1"
selfies_string=sf.encoder(invalid_smiles)
if selfies_string==None:
selfies_string=sf.encoder(invalid_smiles,print_error=True)
# 'Encoding error 'C[C@H](O)[C@@(*)C1=CC=CC=C1': wildcard atom '*' not supported.'
```
## Tests

@@ -182,4 +200,5 @@ SELFIES uses `pytest` with `tox` as its testing framework.

We thank Jacques Boitreaud, Andrew Brereton, Matthew Carbone (x94carbone), Nathan Frey (ncfrey), Theophile Gaudin,
Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Kevin Ryan (LeanAndMean), Benjamin Sanchez-Lengeling,
and Zhenpeng Yao for their suggestions and bug reports, and Robert Pollice for chemistry advices.
HelloJocelynLu, Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Alexander Minidis (DocMinus), Kevin Ryan (LeanAndMean),
Benjamin Sanchez-Lengeling, and Zhenpeng Yao for their suggestions and bug reports,
and Robert Pollice for chemistry advices.

@@ -186,0 +205,0 @@ ## License

Metadata-Version: 2.1
Name: selfies
Version: 1.0.3
Version: 1.0.4
Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs.

@@ -23,3 +23,5 @@ Home-page: https://github.com/aspuru-guzik-group/selfies

[*Machine Learning: Science and Technology* **1**, 045024 (2020)](https://iopscience.iop.org/article/10.1088/2632-2153/aba947), [extensive blog post January 2021](https://aspuru.substack.com/p/molecular-graph-representations-and).<br>
Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_
[Talk on youtube about SELFIES](https://www.youtube.com/watch?v=CaIyUmfGXDk).<br>
Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_<br>
Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)

@@ -144,8 +146,24 @@ A main objective is to use SELFIES as direct input into machine learning models,<br>

* More examples can be found in the ``examples/`` directory, including a
variational autoencoder that runs on the SELFIES language.
[variational autoencoder that runs on the SELFIES](https://github.com/aspuru-guzik-group/selfies/tree/master/examples/vae_example) language.
* This [ICLR2020 paper](https://arxiv.org/abs/1909.11655) used SELFIES in a
genetic algorithm to achieve state-of-the-art performance for inverse design,
with the [code here](https://github.com/aspuru-guzik-group/GA).
* SELFIES allows for [highly efficient exploration and interpolation of the chemical space](https://chemrxiv.org/articles/preprint/Beyond_Generative_Models_Superfast_Traversal_Optimization_Novelty_Exploration_and_Discovery_STONED_Algorithm_for_Molecules_using_SELFIES/13383266), with a [deterministic algorithms, see code](https://github.com/aspuru-guzik-group/stoned-selfies).
* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
## Handling invalid inputs
If an invalid input is presented to the encoder or decoder, the return value is `None`.
The error can be analysed by using the `encoder(...,print_error=True)` option.
```python
import selfies as sf
invalid_smiles="C[C@H](O)[C@@(*)C1=CC=CC=C1"
selfies_string=sf.encoder(invalid_smiles)
if selfies_string==None:
selfies_string=sf.encoder(invalid_smiles,print_error=True)
# 'Encoding error 'C[C@H](O)[C@@(*)C1=CC=CC=C1': wildcard atom '*' not supported.'
```
## Tests

@@ -190,4 +208,5 @@ SELFIES uses `pytest` with `tox` as its testing framework.

We thank Jacques Boitreaud, Andrew Brereton, Matthew Carbone (x94carbone), Nathan Frey (ncfrey), Theophile Gaudin,
Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Kevin Ryan (LeanAndMean), Benjamin Sanchez-Lengeling,
and Zhenpeng Yao for their suggestions and bug reports, and Robert Pollice for chemistry advices.
HelloJocelynLu, Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Alexander Minidis (DocMinus), Kevin Ryan (LeanAndMean),
Benjamin Sanchez-Lengeling, and Zhenpeng Yao for their suggestions and bug reports,
and Robert Pollice for chemistry advices.

@@ -194,0 +213,0 @@ ## License

@@ -34,2 +34,5 @@ #!/usr/bin/env python

"get_semantic_robust_alphabet",
"get_default_constraints",
"get_octet_rule_constraints",
"get_hypervalent_constraints",
"get_semantic_constraints",

@@ -50,2 +53,5 @@ "set_semantic_constraints",

get_semantic_robust_alphabet,
get_default_constraints,
get_octet_rule_constraints,
get_hypervalent_constraints,
get_semantic_constraints,

@@ -52,0 +58,0 @@ set_semantic_constraints,

from collections import OrderedDict
from typing import Dict, Iterable, List, Optional, Tuple, Union
from selfies.grammar_rules import get_bond_from_num, get_n_from_symbols, \
get_next_branch_state, get_next_state, get_num_from_bond
from selfies.grammar_rules import (get_bond_from_num,
get_hypervalent_constraints,
get_n_from_symbols, get_next_branch_state,
get_next_state, get_num_from_bond,
get_octet_rule_constraints,
get_semantic_constraints,
set_semantic_constraints)
def decoder(selfies: str, print_error: bool = False) -> Optional[str]:
def decoder(selfies: str,
print_error: bool = False,
constraints: Optional[str] = None) -> Optional[str]:
"""Translates a SELFIES into a SMILES.

@@ -22,2 +29,6 @@

Defaults to False.
:param constraints: if ``'octet_rule'`` or ``'hypervalent'``,
the corresponding preset bond constraints will be used instead.
If ``None``, :func:`selfies.decoder` will use the
currently configured bond constraints. Defaults to ``None``.
:return: the SMILES translation of ``selfies``. If an error occurs,

@@ -31,4 +42,29 @@ and ``selfies`` cannot be translated, ``None`` is returned instead.

'C=CF'
.. seealso:: The
`"octet_rule" <https://en.wikipedia.org/wiki/Octet_rule>`_
and
`"hypervalent" <https://en.wikipedia.org/wiki/Hypervalent_molecule>`_
preset bond constraints
can be viewed with :func:`selfies.get_octet_rule_constraints` and
:func:`selfies.get_hypervalent_constraints`, respectively. These
presets are variants of the "default" bond constraints, which can
be viewed with :func:`selfies.get_default_constraints`. Their
differences can be summarized as follows:
* def. : ``Cl``, ``Br``, ``I``: 1, ``N``: 3, ``P``: 5, ``P+1``: 6, ``P-1``: 4, ``S``: 6, ``S+1``: 7, ``S-1``: 5
* oct. : ``Cl``, ``Br``, ``I``: 1, ``N``: 3, ``P``: 3, ``P+1``: 4, ``P-1``: 2, ``S``: 2, ``S+1``: 3, ``S-1``: 1
* hyp. : ``Cl``, ``Br``, ``I``: 7, ``N``: 5, ``P``: 5, ``P+1``: 6, ``P-1``: 4, ``S``: 6, ``S+1``: 7, ``S-1``: 5
"""
old_constraints = get_semantic_constraints()
if constraints is None:
pass
elif constraints == 'octet_rule':
set_semantic_constraints(get_octet_rule_constraints())
elif constraints == 'hypervalent':
set_semantic_constraints(get_hypervalent_constraints())
else:
raise ValueError("unrecognized constraint type")
try:

@@ -43,5 +79,11 @@ all_smiles = [] # process dot-separated fragments separately

if constraints is not None: # restore old constraints
set_semantic_constraints(old_constraints)
return '.'.join(all_smiles)
except ValueError as err:
if constraints is not None: # restore old constraints
set_semantic_constraints(old_constraints)
if print_error:

@@ -48,0 +90,0 @@ print("Decoding error '{}': {}.".format(selfies, err))

@@ -9,7 +9,17 @@ from itertools import product

'C': 4, 'C+1': 5, 'C-1': 3,
'P': 5, 'P+1': 6, 'P-1': 4,
'S': 6, 'S+1': 7, 'S-1': 5,
'P': 7, 'P+1': 8, 'P-1': 6,
'?': 8,
'?': 8
}
octet_rule_bond_constraints = dict(default_bond_constraints)
octet_rule_bond_constraints.update(
{'S': 2, 'S+1': 3, 'S-1': 1, 'P': 3, 'P+1': 4, 'P-1': 2}
)
hypervalent_bond_constraints = dict(default_bond_constraints)
hypervalent_bond_constraints.update(
{'Cl': 7, 'Br': 7, 'I': 7, 'N': 5}
)
_bond_constraints = default_bond_constraints

@@ -57,2 +67,43 @@

def get_default_constraints() -> Dict[str, int]:
"""Returns the preset "default" bond constraint settings.
:return: the default constraint settings.
"""
global default_bond_constraints
return dict(default_bond_constraints)
def get_octet_rule_constraints() -> Dict[str, int]:
"""Returns the preset "octet rule" bond constraint settings. These
constraints are a harsher version of the default constraints, so that
the `octet rule <https://en.wikipedia.org/wiki/Octet_rule>`_
is obeyed. In particular, ``S`` and ``P`` are
restricted to a 2 and 3 bond capacity, respectively (and similarly with
``S+``, ``S-``, ``P+``, ``P-``).
:return: the octet rule constraint settings.
"""
global octet_rule_bond_constraints
return dict(octet_rule_bond_constraints)
def get_hypervalent_constraints() -> Dict[str, int]:
"""Returns the preset "hypervalent" bond constraint settings. These
constraints are a relaxed version of the default constraints, to allow
for `hypervalent molecules
<https://en.wikipedia.org/wiki/Hypervalent_molecule>`_.
In particular, ``Cl``, ``Br``, and ``I``
are relaxed to a 7 bond capacity, and ``N`` is relaxed to a 5 bond
capacity.
:return: the hypervalent constraint settings.
"""
global hypervalent_bond_constraints
return dict(hypervalent_bond_constraints)
def get_semantic_constraints() -> Dict[str, int]:

@@ -59,0 +110,0 @@ """Returns the semantic bond constraints that :mod:`selfies` is currently

+1
-1

@@ -10,3 +10,3 @@ #!/usr/bin/env python

name="selfies",
version="1.0.3",
version="1.0.4",
author="Mario Krenn",

@@ -13,0 +13,0 @@ author_email="mario.krenn@utoronto.ca, alan@aspuru.com",