selfies
Advanced tools
+24
-5
| Metadata-Version: 2.1 | ||
| Name: selfies | ||
| Version: 1.0.3 | ||
| Version: 1.0.4 | ||
| Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs. | ||
@@ -23,3 +23,5 @@ Home-page: https://github.com/aspuru-guzik-group/selfies | ||
| [*Machine Learning: Science and Technology* **1**, 045024 (2020)](https://iopscience.iop.org/article/10.1088/2632-2153/aba947), [extensive blog post January 2021](https://aspuru.substack.com/p/molecular-graph-representations-and).<br> | ||
| Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_ | ||
| [Talk on youtube about SELFIES](https://www.youtube.com/watch?v=CaIyUmfGXDk).<br> | ||
| Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_<br> | ||
| Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ) | ||
@@ -144,8 +146,24 @@ A main objective is to use SELFIES as direct input into machine learning models,<br> | ||
| * More examples can be found in the ``examples/`` directory, including a | ||
| variational autoencoder that runs on the SELFIES language. | ||
| [variational autoencoder that runs on the SELFIES](https://github.com/aspuru-guzik-group/selfies/tree/master/examples/vae_example) language. | ||
| * This [ICLR2020 paper](https://arxiv.org/abs/1909.11655) used SELFIES in a | ||
| genetic algorithm to achieve state-of-the-art performance for inverse design, | ||
| with the [code here](https://github.com/aspuru-guzik-group/GA). | ||
| * SELFIES allows for [highly efficient exploration and interpolation of the chemical space](https://chemrxiv.org/articles/preprint/Beyond_Generative_Models_Superfast_Traversal_Optimization_Novelty_Exploration_and_Discovery_STONED_Algorithm_for_Molecules_using_SELFIES/13383266), with a [deterministic algorithms, see code](https://github.com/aspuru-guzik-group/stoned-selfies). | ||
| * We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea). | ||
| * Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator). | ||
| ## Handling invalid inputs | ||
| If an invalid input is presented to the encoder or decoder, the return value is `None`. | ||
| The error can be analysed by using the `encoder(...,print_error=True)` option. | ||
| ```python | ||
| import selfies as sf | ||
| invalid_smiles="C[C@H](O)[C@@(*)C1=CC=CC=C1" | ||
| selfies_string=sf.encoder(invalid_smiles) | ||
| if selfies_string==None: | ||
| selfies_string=sf.encoder(invalid_smiles,print_error=True) | ||
| # 'Encoding error 'C[C@H](O)[C@@(*)C1=CC=CC=C1': wildcard atom '*' not supported.' | ||
| ``` | ||
| ## Tests | ||
@@ -190,4 +208,5 @@ SELFIES uses `pytest` with `tox` as its testing framework. | ||
| We thank Jacques Boitreaud, Andrew Brereton, Matthew Carbone (x94carbone), Nathan Frey (ncfrey), Theophile Gaudin, | ||
| Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Kevin Ryan (LeanAndMean), Benjamin Sanchez-Lengeling, | ||
| and Zhenpeng Yao for their suggestions and bug reports, and Robert Pollice for chemistry advices. | ||
| HelloJocelynLu, Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Alexander Minidis (DocMinus), Kevin Ryan (LeanAndMean), | ||
| Benjamin Sanchez-Lengeling, and Zhenpeng Yao for their suggestions and bug reports, | ||
| and Robert Pollice for chemistry advices. | ||
@@ -194,0 +213,0 @@ ## License |
+23
-4
@@ -15,3 +15,5 @@ # SELFIES | ||
| [*Machine Learning: Science and Technology* **1**, 045024 (2020)](https://iopscience.iop.org/article/10.1088/2632-2153/aba947), [extensive blog post January 2021](https://aspuru.substack.com/p/molecular-graph-representations-and).<br> | ||
| Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_ | ||
| [Talk on youtube about SELFIES](https://www.youtube.com/watch?v=CaIyUmfGXDk).<br> | ||
| Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_<br> | ||
| Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ) | ||
@@ -136,8 +138,24 @@ A main objective is to use SELFIES as direct input into machine learning models,<br> | ||
| * More examples can be found in the ``examples/`` directory, including a | ||
| variational autoencoder that runs on the SELFIES language. | ||
| [variational autoencoder that runs on the SELFIES](https://github.com/aspuru-guzik-group/selfies/tree/master/examples/vae_example) language. | ||
| * This [ICLR2020 paper](https://arxiv.org/abs/1909.11655) used SELFIES in a | ||
| genetic algorithm to achieve state-of-the-art performance for inverse design, | ||
| with the [code here](https://github.com/aspuru-guzik-group/GA). | ||
| * SELFIES allows for [highly efficient exploration and interpolation of the chemical space](https://chemrxiv.org/articles/preprint/Beyond_Generative_Models_Superfast_Traversal_Optimization_Novelty_Exploration_and_Discovery_STONED_Algorithm_for_Molecules_using_SELFIES/13383266), with a [deterministic algorithms, see code](https://github.com/aspuru-guzik-group/stoned-selfies). | ||
| * We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea). | ||
| * Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator). | ||
| ## Handling invalid inputs | ||
| If an invalid input is presented to the encoder or decoder, the return value is `None`. | ||
| The error can be analysed by using the `encoder(...,print_error=True)` option. | ||
| ```python | ||
| import selfies as sf | ||
| invalid_smiles="C[C@H](O)[C@@(*)C1=CC=CC=C1" | ||
| selfies_string=sf.encoder(invalid_smiles) | ||
| if selfies_string==None: | ||
| selfies_string=sf.encoder(invalid_smiles,print_error=True) | ||
| # 'Encoding error 'C[C@H](O)[C@@(*)C1=CC=CC=C1': wildcard atom '*' not supported.' | ||
| ``` | ||
| ## Tests | ||
@@ -182,4 +200,5 @@ SELFIES uses `pytest` with `tox` as its testing framework. | ||
| We thank Jacques Boitreaud, Andrew Brereton, Matthew Carbone (x94carbone), Nathan Frey (ncfrey), Theophile Gaudin, | ||
| Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Kevin Ryan (LeanAndMean), Benjamin Sanchez-Lengeling, | ||
| and Zhenpeng Yao for their suggestions and bug reports, and Robert Pollice for chemistry advices. | ||
| HelloJocelynLu, Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Alexander Minidis (DocMinus), Kevin Ryan (LeanAndMean), | ||
| Benjamin Sanchez-Lengeling, and Zhenpeng Yao for their suggestions and bug reports, | ||
| and Robert Pollice for chemistry advices. | ||
@@ -186,0 +205,0 @@ ## License |
| Metadata-Version: 2.1 | ||
| Name: selfies | ||
| Version: 1.0.3 | ||
| Version: 1.0.4 | ||
| Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs. | ||
@@ -23,3 +23,5 @@ Home-page: https://github.com/aspuru-guzik-group/selfies | ||
| [*Machine Learning: Science and Technology* **1**, 045024 (2020)](https://iopscience.iop.org/article/10.1088/2632-2153/aba947), [extensive blog post January 2021](https://aspuru.substack.com/p/molecular-graph-representations-and).<br> | ||
| Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_ | ||
| [Talk on youtube about SELFIES](https://www.youtube.com/watch?v=CaIyUmfGXDk).<br> | ||
| Major contributors since v1.0.0: _[Alston Lo](https://github.com/aspuru-guzik-group/selfies/commits?author=alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_<br> | ||
| Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ) | ||
@@ -144,8 +146,24 @@ A main objective is to use SELFIES as direct input into machine learning models,<br> | ||
| * More examples can be found in the ``examples/`` directory, including a | ||
| variational autoencoder that runs on the SELFIES language. | ||
| [variational autoencoder that runs on the SELFIES](https://github.com/aspuru-guzik-group/selfies/tree/master/examples/vae_example) language. | ||
| * This [ICLR2020 paper](https://arxiv.org/abs/1909.11655) used SELFIES in a | ||
| genetic algorithm to achieve state-of-the-art performance for inverse design, | ||
| with the [code here](https://github.com/aspuru-guzik-group/GA). | ||
| * SELFIES allows for [highly efficient exploration and interpolation of the chemical space](https://chemrxiv.org/articles/preprint/Beyond_Generative_Models_Superfast_Traversal_Optimization_Novelty_Exploration_and_Discovery_STONED_Algorithm_for_Molecules_using_SELFIES/13383266), with a [deterministic algorithms, see code](https://github.com/aspuru-guzik-group/stoned-selfies). | ||
| * We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea). | ||
| * Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator). | ||
| ## Handling invalid inputs | ||
| If an invalid input is presented to the encoder or decoder, the return value is `None`. | ||
| The error can be analysed by using the `encoder(...,print_error=True)` option. | ||
| ```python | ||
| import selfies as sf | ||
| invalid_smiles="C[C@H](O)[C@@(*)C1=CC=CC=C1" | ||
| selfies_string=sf.encoder(invalid_smiles) | ||
| if selfies_string==None: | ||
| selfies_string=sf.encoder(invalid_smiles,print_error=True) | ||
| # 'Encoding error 'C[C@H](O)[C@@(*)C1=CC=CC=C1': wildcard atom '*' not supported.' | ||
| ``` | ||
| ## Tests | ||
@@ -190,4 +208,5 @@ SELFIES uses `pytest` with `tox` as its testing framework. | ||
| We thank Jacques Boitreaud, Andrew Brereton, Matthew Carbone (x94carbone), Nathan Frey (ncfrey), Theophile Gaudin, | ||
| Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Kevin Ryan (LeanAndMean), Benjamin Sanchez-Lengeling, | ||
| and Zhenpeng Yao for their suggestions and bug reports, and Robert Pollice for chemistry advices. | ||
| HelloJocelynLu, Hyunmin Kim (hmkim), Minjie Li, Vincent Mallet, Alexander Minidis (DocMinus), Kevin Ryan (LeanAndMean), | ||
| Benjamin Sanchez-Lengeling, and Zhenpeng Yao for their suggestions and bug reports, | ||
| and Robert Pollice for chemistry advices. | ||
@@ -194,0 +213,0 @@ ## License |
@@ -34,2 +34,5 @@ #!/usr/bin/env python | ||
| "get_semantic_robust_alphabet", | ||
| "get_default_constraints", | ||
| "get_octet_rule_constraints", | ||
| "get_hypervalent_constraints", | ||
| "get_semantic_constraints", | ||
@@ -50,2 +53,5 @@ "set_semantic_constraints", | ||
| get_semantic_robust_alphabet, | ||
| get_default_constraints, | ||
| get_octet_rule_constraints, | ||
| get_hypervalent_constraints, | ||
| get_semantic_constraints, | ||
@@ -52,0 +58,0 @@ set_semantic_constraints, |
+45
-3
| from collections import OrderedDict | ||
| from typing import Dict, Iterable, List, Optional, Tuple, Union | ||
| from selfies.grammar_rules import get_bond_from_num, get_n_from_symbols, \ | ||
| get_next_branch_state, get_next_state, get_num_from_bond | ||
| from selfies.grammar_rules import (get_bond_from_num, | ||
| get_hypervalent_constraints, | ||
| get_n_from_symbols, get_next_branch_state, | ||
| get_next_state, get_num_from_bond, | ||
| get_octet_rule_constraints, | ||
| get_semantic_constraints, | ||
| set_semantic_constraints) | ||
| def decoder(selfies: str, print_error: bool = False) -> Optional[str]: | ||
| def decoder(selfies: str, | ||
| print_error: bool = False, | ||
| constraints: Optional[str] = None) -> Optional[str]: | ||
| """Translates a SELFIES into a SMILES. | ||
@@ -22,2 +29,6 @@ | ||
| Defaults to False. | ||
| :param constraints: if ``'octet_rule'`` or ``'hypervalent'``, | ||
| the corresponding preset bond constraints will be used instead. | ||
| If ``None``, :func:`selfies.decoder` will use the | ||
| currently configured bond constraints. Defaults to ``None``. | ||
| :return: the SMILES translation of ``selfies``. If an error occurs, | ||
@@ -31,4 +42,29 @@ and ``selfies`` cannot be translated, ``None`` is returned instead. | ||
| 'C=CF' | ||
| .. seealso:: The | ||
| `"octet_rule" <https://en.wikipedia.org/wiki/Octet_rule>`_ | ||
| and | ||
| `"hypervalent" <https://en.wikipedia.org/wiki/Hypervalent_molecule>`_ | ||
| preset bond constraints | ||
| can be viewed with :func:`selfies.get_octet_rule_constraints` and | ||
| :func:`selfies.get_hypervalent_constraints`, respectively. These | ||
| presets are variants of the "default" bond constraints, which can | ||
| be viewed with :func:`selfies.get_default_constraints`. Their | ||
| differences can be summarized as follows: | ||
| * def. : ``Cl``, ``Br``, ``I``: 1, ``N``: 3, ``P``: 5, ``P+1``: 6, ``P-1``: 4, ``S``: 6, ``S+1``: 7, ``S-1``: 5 | ||
| * oct. : ``Cl``, ``Br``, ``I``: 1, ``N``: 3, ``P``: 3, ``P+1``: 4, ``P-1``: 2, ``S``: 2, ``S+1``: 3, ``S-1``: 1 | ||
| * hyp. : ``Cl``, ``Br``, ``I``: 7, ``N``: 5, ``P``: 5, ``P+1``: 6, ``P-1``: 4, ``S``: 6, ``S+1``: 7, ``S-1``: 5 | ||
| """ | ||
| old_constraints = get_semantic_constraints() | ||
| if constraints is None: | ||
| pass | ||
| elif constraints == 'octet_rule': | ||
| set_semantic_constraints(get_octet_rule_constraints()) | ||
| elif constraints == 'hypervalent': | ||
| set_semantic_constraints(get_hypervalent_constraints()) | ||
| else: | ||
| raise ValueError("unrecognized constraint type") | ||
| try: | ||
@@ -43,5 +79,11 @@ all_smiles = [] # process dot-separated fragments separately | ||
| if constraints is not None: # restore old constraints | ||
| set_semantic_constraints(old_constraints) | ||
| return '.'.join(all_smiles) | ||
| except ValueError as err: | ||
| if constraints is not None: # restore old constraints | ||
| set_semantic_constraints(old_constraints) | ||
| if print_error: | ||
@@ -48,0 +90,0 @@ print("Decoding error '{}': {}.".format(selfies, err)) |
@@ -9,7 +9,17 @@ from itertools import product | ||
| 'C': 4, 'C+1': 5, 'C-1': 3, | ||
| 'P': 5, 'P+1': 6, 'P-1': 4, | ||
| 'S': 6, 'S+1': 7, 'S-1': 5, | ||
| 'P': 7, 'P+1': 8, 'P-1': 6, | ||
| '?': 8, | ||
| '?': 8 | ||
| } | ||
| octet_rule_bond_constraints = dict(default_bond_constraints) | ||
| octet_rule_bond_constraints.update( | ||
| {'S': 2, 'S+1': 3, 'S-1': 1, 'P': 3, 'P+1': 4, 'P-1': 2} | ||
| ) | ||
| hypervalent_bond_constraints = dict(default_bond_constraints) | ||
| hypervalent_bond_constraints.update( | ||
| {'Cl': 7, 'Br': 7, 'I': 7, 'N': 5} | ||
| ) | ||
| _bond_constraints = default_bond_constraints | ||
@@ -57,2 +67,43 @@ | ||
| def get_default_constraints() -> Dict[str, int]: | ||
| """Returns the preset "default" bond constraint settings. | ||
| :return: the default constraint settings. | ||
| """ | ||
| global default_bond_constraints | ||
| return dict(default_bond_constraints) | ||
| def get_octet_rule_constraints() -> Dict[str, int]: | ||
| """Returns the preset "octet rule" bond constraint settings. These | ||
| constraints are a harsher version of the default constraints, so that | ||
| the `octet rule <https://en.wikipedia.org/wiki/Octet_rule>`_ | ||
| is obeyed. In particular, ``S`` and ``P`` are | ||
| restricted to a 2 and 3 bond capacity, respectively (and similarly with | ||
| ``S+``, ``S-``, ``P+``, ``P-``). | ||
| :return: the octet rule constraint settings. | ||
| """ | ||
| global octet_rule_bond_constraints | ||
| return dict(octet_rule_bond_constraints) | ||
| def get_hypervalent_constraints() -> Dict[str, int]: | ||
| """Returns the preset "hypervalent" bond constraint settings. These | ||
| constraints are a relaxed version of the default constraints, to allow | ||
| for `hypervalent molecules | ||
| <https://en.wikipedia.org/wiki/Hypervalent_molecule>`_. | ||
| In particular, ``Cl``, ``Br``, and ``I`` | ||
| are relaxed to a 7 bond capacity, and ``N`` is relaxed to a 5 bond | ||
| capacity. | ||
| :return: the hypervalent constraint settings. | ||
| """ | ||
| global hypervalent_bond_constraints | ||
| return dict(hypervalent_bond_constraints) | ||
| def get_semantic_constraints() -> Dict[str, int]: | ||
@@ -59,0 +110,0 @@ """Returns the semantic bond constraints that :mod:`selfies` is currently |
+1
-1
@@ -10,3 +10,3 @@ #!/usr/bin/env python | ||
| name="selfies", | ||
| version="1.0.3", | ||
| version="1.0.4", | ||
| author="Mario Krenn", | ||
@@ -13,0 +13,0 @@ author_email="mario.krenn@utoronto.ca, alan@aspuru.com", |
Alert delta unavailable
Currently unable to show alert delta for PyPI packages.
106161
10.72%1475
5.73%