Latest Threat Research:SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains.Details
Socket
Book a DemoInstallSign in
Socket

selfies

Package Overview
Dependencies
Maintainers
1
Versions
16
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

selfies - npm Package Compare versions

Comparing version
2.0.0
to
2.1.0
+81
-47
PKG-INFO
Metadata-Version: 2.1
Name: selfies
Version: 2.0.0
Version: 2.1.0
Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs.

@@ -25,3 +25,4 @@ Home-page: https://github.com/aspuru-guzik-group/selfies

[Blog explaining SELFIES in Japanese language](https://blacktanktop.hatenablog.com/entry/2021/08/12/115613)\
Major contributors since v1.0.0: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
Major contributors of v1.0.n: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
Main developer of v2.0.0: _[Alston Lo](https://github.com/alstonlo)_\
Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)

@@ -47,3 +48,3 @@

To check if the correct version of ``selfies`` is installed, use
the following pip command.
the following pip command.

@@ -54,9 +55,9 @@ ```bash

To upgrade to the latest release of ``selfies`` if you are using an
older version, use the following pip command. Please see the
[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
to review the changes between versions of `selfies`, before upgrading:
To upgrade to the latest release of ``selfies`` if you are using an
older version, use the following pip command. Please see the
[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
to review the changes between versions of `selfies`, before upgrading:
```bash
pip install selfies --upgrade
pip install selfies --upgrade
```

@@ -70,16 +71,16 @@

Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
which contains a thorough tutorial for getting started with ``selfies``
which contains a thorough tutorial for getting started with ``selfies``
and detailed descriptions of the functions
that ``selfies`` provides. We summarize some key functions below.
| Function | Description |
| -------- | ----------- |
| ``selfies.encoder`` | Translates a SMILES string into its corresponding SELFIES string. |
| ``selfies.decoder`` | Translates a SELFIES string into its corresponding SMILES string. |
| ``selfies.set_semantic_constraints`` | Configures the semantic constraints that ``selfies`` operates on. |
| ``selfies.len_selfies`` | Returns the number of symbols in a SELFIES string. |
| ``selfies.split_selfies`` | Tokenizes a SELFIES string into its individual symbols. |
| ``selfies.get_alphabet_from_selfies`` | Constructs an alphabet from an iterable of SELFIES strings. |
| ``selfies.selfies_to_encoding`` | Converts a SELFIES string into its label and/or one-hot encoding. |
| ``selfies.encoding_to_selfies`` | Converts a label or one-hot encoding into a SELFIES string. |
| Function | Description |
| ------------------------------------- | ----------------------------------------------------------------- |
| ``selfies.encoder`` | Translates a SMILES string into its corresponding SELFIES string. |
| ``selfies.decoder`` | Translates a SELFIES string into its corresponding SMILES string. |
| ``selfies.set_semantic_constraints`` | Configures the semantic constraints that ``selfies`` operates on. |
| ``selfies.len_selfies`` | Returns the number of symbols in a SELFIES string. |
| ``selfies.split_selfies`` | Tokenizes a SELFIES string into its individual symbols. |
| ``selfies.get_alphabet_from_selfies`` | Constructs an alphabet from an iterable of SELFIES strings. |
| ``selfies.selfies_to_encoding`` | Converts a SELFIES string into its label and/or one-hot encoding. |
| ``selfies.encoding_to_selfies`` | Converts a label or one-hot encoding into a SELFIES string. |

@@ -111,24 +112,6 @@

#### Customizing SELFIES:
In this example, we relax the semantic constraints of ``selfies`` to allow
for hypervalences (caution: hypervalence rules are much less understood
than octet rules. Some molecules containing hypervalences are important,
but generally, it is not known which molecules are stable and reasonable).
```python
import selfies as sf
hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
standard_derived_smi = sf.decoder(hypervalent_sf)
# OI (the default constraints for I allows for only 1 bond)
sf.set_semantic_constraints("hypervalent")
relaxed_derived_smi = sf.decoder(hypervalent_sf)
# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
```
#### Integer and one-hot encoding SELFIES:
In this example, we first build an alphabet from a dataset of SELFIES strings,
In this example, we first build an alphabet from a dataset of SELFIES strings,
and then convert a SELFIES string into its padded encoding. Note that we use the

@@ -163,2 +146,52 @@ ``[nop]`` ([no operation](https://en.wikipedia.org/wiki/NOP_(code) ))

#### Customizing SELFIES:
In this example, we relax the semantic constraints of ``selfies`` to allow
for hypervalences (caution: hypervalence rules are much less understood
than octet rules. Some molecules containing hypervalences are important,
but generally, it is not known which molecules are stable and reasonable).
```python
import selfies as sf
hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
standard_derived_smi = sf.decoder(hypervalent_sf)
# OI (the default constraints for I allows for only 1 bond)
sf.set_semantic_constraints("hypervalent")
relaxed_derived_smi = sf.decoder(hypervalent_sf)
# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
```
#### Explaining Translation:
You can get an "attribution" list that traces the connection between input and output tokens. For example let's see which tokens in the SELFIES string ``[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]`` are responsible for the output SMILES tokens.
```python
selfies = "[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]"
smiles, attr = sf.decoder(
selfies, attribute=True)
print('SELFIES', selfies)
print('SMILES', smiles)
print('Attribution:')
for smiles_token, a in attr:
print(smiles_token)
if a:
for j, selfies_token in a:
print(f'\t{j}:{selfies_token}')
# output
SELFIES [C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]
SMILES C1NC(P)CC1
Attribution:
AttributionMap(index=0, token='C', attribution=[Attribution(index=0, token='[C]')])
AttributionMap(index=4, token='N', attribution=[Attribution(index=1, token='[N]')])
AttributionMap(index=6, token='C', attribution=[Attribution(index=2, token='[C]')])
AttributionMap(index=9, token='P', attribution=[Attribution(index=3, token='[Branch1]'), Attribution(index=5, token='[P]')])
AttributionMap(index=12, token='C', attribution=[Attribution(index=6, token='[C]')])
AttributionMap(index=14, token='C', attribution=[Attribution(index=7, token='[C]')])
```
``attr`` is a list of `AttributionMap`s containing the output token, its index, and input tokens that led to it. For example, the ``P`` appearing in the output SMILES at that location is a result of both the ``[Branch1]`` token at position 3 and the ``[P]`` token at index 5. This works for both encoding and decoding. For finer control of tracking the translation (like tracking rings), you can access attributions in the underlying molecular graph with ``get_attribution``.
### More Usages and Examples

@@ -173,4 +206,5 @@

* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
* Nathan Frey, Vijay Gadepally, and Bharath Ramsundar used SELFIES with normalizing flows to develop the [FastFlows](https://arxiv.org/abs/2201.12419) framework for deep chemical generative modeling.
* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.

@@ -180,3 +214,3 @@ ## Tests

All tests can be found in the `tests/` directory. To run the test suite for
SELFIES, install ``tox`` and run:
SELFIES, install ``tox`` and run:

@@ -195,5 +229,5 @@ ```bash

* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.

@@ -216,10 +250,10 @@ ## Version History

Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.5
Requires-Python: >=3.7
Description-Content-Type: text/markdown
+77
-43

@@ -17,3 +17,4 @@ # SELFIES

[Blog explaining SELFIES in Japanese language](https://blacktanktop.hatenablog.com/entry/2021/08/12/115613)\
Major contributors since v1.0.0: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
Major contributors of v1.0.n: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
Main developer of v2.0.0: _[Alston Lo](https://github.com/alstonlo)_\
Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)

@@ -39,3 +40,3 @@

To check if the correct version of ``selfies`` is installed, use
the following pip command.
the following pip command.

@@ -46,9 +47,9 @@ ```bash

To upgrade to the latest release of ``selfies`` if you are using an
older version, use the following pip command. Please see the
[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
to review the changes between versions of `selfies`, before upgrading:
To upgrade to the latest release of ``selfies`` if you are using an
older version, use the following pip command. Please see the
[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
to review the changes between versions of `selfies`, before upgrading:
```bash
pip install selfies --upgrade
pip install selfies --upgrade
```

@@ -62,16 +63,16 @@

Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
which contains a thorough tutorial for getting started with ``selfies``
which contains a thorough tutorial for getting started with ``selfies``
and detailed descriptions of the functions
that ``selfies`` provides. We summarize some key functions below.
| Function | Description |
| -------- | ----------- |
| ``selfies.encoder`` | Translates a SMILES string into its corresponding SELFIES string. |
| ``selfies.decoder`` | Translates a SELFIES string into its corresponding SMILES string. |
| ``selfies.set_semantic_constraints`` | Configures the semantic constraints that ``selfies`` operates on. |
| ``selfies.len_selfies`` | Returns the number of symbols in a SELFIES string. |
| ``selfies.split_selfies`` | Tokenizes a SELFIES string into its individual symbols. |
| ``selfies.get_alphabet_from_selfies`` | Constructs an alphabet from an iterable of SELFIES strings. |
| ``selfies.selfies_to_encoding`` | Converts a SELFIES string into its label and/or one-hot encoding. |
| ``selfies.encoding_to_selfies`` | Converts a label or one-hot encoding into a SELFIES string. |
| Function | Description |
| ------------------------------------- | ----------------------------------------------------------------- |
| ``selfies.encoder`` | Translates a SMILES string into its corresponding SELFIES string. |
| ``selfies.decoder`` | Translates a SELFIES string into its corresponding SMILES string. |
| ``selfies.set_semantic_constraints`` | Configures the semantic constraints that ``selfies`` operates on. |
| ``selfies.len_selfies`` | Returns the number of symbols in a SELFIES string. |
| ``selfies.split_selfies`` | Tokenizes a SELFIES string into its individual symbols. |
| ``selfies.get_alphabet_from_selfies`` | Constructs an alphabet from an iterable of SELFIES strings. |
| ``selfies.selfies_to_encoding`` | Converts a SELFIES string into its label and/or one-hot encoding. |
| ``selfies.encoding_to_selfies`` | Converts a label or one-hot encoding into a SELFIES string. |

@@ -103,24 +104,6 @@

#### Customizing SELFIES:
In this example, we relax the semantic constraints of ``selfies`` to allow
for hypervalences (caution: hypervalence rules are much less understood
than octet rules. Some molecules containing hypervalences are important,
but generally, it is not known which molecules are stable and reasonable).
```python
import selfies as sf
hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
standard_derived_smi = sf.decoder(hypervalent_sf)
# OI (the default constraints for I allows for only 1 bond)
sf.set_semantic_constraints("hypervalent")
relaxed_derived_smi = sf.decoder(hypervalent_sf)
# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
```
#### Integer and one-hot encoding SELFIES:
In this example, we first build an alphabet from a dataset of SELFIES strings,
In this example, we first build an alphabet from a dataset of SELFIES strings,
and then convert a SELFIES string into its padded encoding. Note that we use the

@@ -155,2 +138,52 @@ ``[nop]`` ([no operation](https://en.wikipedia.org/wiki/NOP_(code) ))

#### Customizing SELFIES:
In this example, we relax the semantic constraints of ``selfies`` to allow
for hypervalences (caution: hypervalence rules are much less understood
than octet rules. Some molecules containing hypervalences are important,
but generally, it is not known which molecules are stable and reasonable).
```python
import selfies as sf
hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
standard_derived_smi = sf.decoder(hypervalent_sf)
# OI (the default constraints for I allows for only 1 bond)
sf.set_semantic_constraints("hypervalent")
relaxed_derived_smi = sf.decoder(hypervalent_sf)
# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
```
#### Explaining Translation:
You can get an "attribution" list that traces the connection between input and output tokens. For example let's see which tokens in the SELFIES string ``[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]`` are responsible for the output SMILES tokens.
```python
selfies = "[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]"
smiles, attr = sf.decoder(
selfies, attribute=True)
print('SELFIES', selfies)
print('SMILES', smiles)
print('Attribution:')
for smiles_token, a in attr:
print(smiles_token)
if a:
for j, selfies_token in a:
print(f'\t{j}:{selfies_token}')
# output
SELFIES [C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]
SMILES C1NC(P)CC1
Attribution:
AttributionMap(index=0, token='C', attribution=[Attribution(index=0, token='[C]')])
AttributionMap(index=4, token='N', attribution=[Attribution(index=1, token='[N]')])
AttributionMap(index=6, token='C', attribution=[Attribution(index=2, token='[C]')])
AttributionMap(index=9, token='P', attribution=[Attribution(index=3, token='[Branch1]'), Attribution(index=5, token='[P]')])
AttributionMap(index=12, token='C', attribution=[Attribution(index=6, token='[C]')])
AttributionMap(index=14, token='C', attribution=[Attribution(index=7, token='[C]')])
```
``attr`` is a list of `AttributionMap`s containing the output token, its index, and input tokens that led to it. For example, the ``P`` appearing in the output SMILES at that location is a result of both the ``[Branch1]`` token at position 3 and the ``[P]`` token at index 5. This works for both encoding and decoding. For finer control of tracking the translation (like tracking rings), you can access attributions in the underlying molecular graph with ``get_attribution``.
### More Usages and Examples

@@ -165,4 +198,5 @@

* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
* Nathan Frey, Vijay Gadepally, and Bharath Ramsundar used SELFIES with normalizing flows to develop the [FastFlows](https://arxiv.org/abs/2201.12419) framework for deep chemical generative modeling.
* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.

@@ -172,3 +206,3 @@ ## Tests

All tests can be found in the `tests/` directory. To run the test suite for
SELFIES, install ``tox`` and run:
SELFIES, install ``tox`` and run:

@@ -187,5 +221,5 @@ ```bash

* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.

@@ -192,0 +226,0 @@ ## Version History

Metadata-Version: 2.1
Name: selfies
Version: 2.0.0
Version: 2.1.0
Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs.

@@ -25,3 +25,4 @@ Home-page: https://github.com/aspuru-guzik-group/selfies

[Blog explaining SELFIES in Japanese language](https://blacktanktop.hatenablog.com/entry/2021/08/12/115613)\
Major contributors since v1.0.0: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
Major contributors of v1.0.n: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
Main developer of v2.0.0: _[Alston Lo](https://github.com/alstonlo)_\
Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)

@@ -47,3 +48,3 @@

To check if the correct version of ``selfies`` is installed, use
the following pip command.
the following pip command.

@@ -54,9 +55,9 @@ ```bash

To upgrade to the latest release of ``selfies`` if you are using an
older version, use the following pip command. Please see the
[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
to review the changes between versions of `selfies`, before upgrading:
To upgrade to the latest release of ``selfies`` if you are using an
older version, use the following pip command. Please see the
[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
to review the changes between versions of `selfies`, before upgrading:
```bash
pip install selfies --upgrade
pip install selfies --upgrade
```

@@ -70,16 +71,16 @@

Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
which contains a thorough tutorial for getting started with ``selfies``
which contains a thorough tutorial for getting started with ``selfies``
and detailed descriptions of the functions
that ``selfies`` provides. We summarize some key functions below.
| Function | Description |
| -------- | ----------- |
| ``selfies.encoder`` | Translates a SMILES string into its corresponding SELFIES string. |
| ``selfies.decoder`` | Translates a SELFIES string into its corresponding SMILES string. |
| ``selfies.set_semantic_constraints`` | Configures the semantic constraints that ``selfies`` operates on. |
| ``selfies.len_selfies`` | Returns the number of symbols in a SELFIES string. |
| ``selfies.split_selfies`` | Tokenizes a SELFIES string into its individual symbols. |
| ``selfies.get_alphabet_from_selfies`` | Constructs an alphabet from an iterable of SELFIES strings. |
| ``selfies.selfies_to_encoding`` | Converts a SELFIES string into its label and/or one-hot encoding. |
| ``selfies.encoding_to_selfies`` | Converts a label or one-hot encoding into a SELFIES string. |
| Function | Description |
| ------------------------------------- | ----------------------------------------------------------------- |
| ``selfies.encoder`` | Translates a SMILES string into its corresponding SELFIES string. |
| ``selfies.decoder`` | Translates a SELFIES string into its corresponding SMILES string. |
| ``selfies.set_semantic_constraints`` | Configures the semantic constraints that ``selfies`` operates on. |
| ``selfies.len_selfies`` | Returns the number of symbols in a SELFIES string. |
| ``selfies.split_selfies`` | Tokenizes a SELFIES string into its individual symbols. |
| ``selfies.get_alphabet_from_selfies`` | Constructs an alphabet from an iterable of SELFIES strings. |
| ``selfies.selfies_to_encoding`` | Converts a SELFIES string into its label and/or one-hot encoding. |
| ``selfies.encoding_to_selfies`` | Converts a label or one-hot encoding into a SELFIES string. |

@@ -111,24 +112,6 @@

#### Customizing SELFIES:
In this example, we relax the semantic constraints of ``selfies`` to allow
for hypervalences (caution: hypervalence rules are much less understood
than octet rules. Some molecules containing hypervalences are important,
but generally, it is not known which molecules are stable and reasonable).
```python
import selfies as sf
hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
standard_derived_smi = sf.decoder(hypervalent_sf)
# OI (the default constraints for I allows for only 1 bond)
sf.set_semantic_constraints("hypervalent")
relaxed_derived_smi = sf.decoder(hypervalent_sf)
# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
```
#### Integer and one-hot encoding SELFIES:
In this example, we first build an alphabet from a dataset of SELFIES strings,
In this example, we first build an alphabet from a dataset of SELFIES strings,
and then convert a SELFIES string into its padded encoding. Note that we use the

@@ -163,2 +146,52 @@ ``[nop]`` ([no operation](https://en.wikipedia.org/wiki/NOP_(code) ))

#### Customizing SELFIES:
In this example, we relax the semantic constraints of ``selfies`` to allow
for hypervalences (caution: hypervalence rules are much less understood
than octet rules. Some molecules containing hypervalences are important,
but generally, it is not known which molecules are stable and reasonable).
```python
import selfies as sf
hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
standard_derived_smi = sf.decoder(hypervalent_sf)
# OI (the default constraints for I allows for only 1 bond)
sf.set_semantic_constraints("hypervalent")
relaxed_derived_smi = sf.decoder(hypervalent_sf)
# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
```
#### Explaining Translation:
You can get an "attribution" list that traces the connection between input and output tokens. For example let's see which tokens in the SELFIES string ``[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]`` are responsible for the output SMILES tokens.
```python
selfies = "[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]"
smiles, attr = sf.decoder(
selfies, attribute=True)
print('SELFIES', selfies)
print('SMILES', smiles)
print('Attribution:')
for smiles_token, a in attr:
print(smiles_token)
if a:
for j, selfies_token in a:
print(f'\t{j}:{selfies_token}')
# output
SELFIES [C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]
SMILES C1NC(P)CC1
Attribution:
AttributionMap(index=0, token='C', attribution=[Attribution(index=0, token='[C]')])
AttributionMap(index=4, token='N', attribution=[Attribution(index=1, token='[N]')])
AttributionMap(index=6, token='C', attribution=[Attribution(index=2, token='[C]')])
AttributionMap(index=9, token='P', attribution=[Attribution(index=3, token='[Branch1]'), Attribution(index=5, token='[P]')])
AttributionMap(index=12, token='C', attribution=[Attribution(index=6, token='[C]')])
AttributionMap(index=14, token='C', attribution=[Attribution(index=7, token='[C]')])
```
``attr`` is a list of `AttributionMap`s containing the output token, its index, and input tokens that led to it. For example, the ``P`` appearing in the output SMILES at that location is a result of both the ``[Branch1]`` token at position 3 and the ``[P]`` token at index 5. This works for both encoding and decoding. For finer control of tracking the translation (like tracking rings), you can access attributions in the underlying molecular graph with ``get_attribution``.
### More Usages and Examples

@@ -173,4 +206,5 @@

* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.
* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
* Nathan Frey, Vijay Gadepally, and Bharath Ramsundar used SELFIES with normalizing flows to develop the [FastFlows](https://arxiv.org/abs/2201.12419) framework for deep chemical generative modeling.
* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.

@@ -180,3 +214,3 @@ ## Tests

All tests can be found in the `tests/` directory. To run the test suite for
SELFIES, install ``tox`` and run:
SELFIES, install ``tox`` and run:

@@ -195,5 +229,5 @@ ```bash

* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.

@@ -216,10 +250,10 @@ ## Version History

Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.5
Requires-Python: >=3.7
Description-Content-Type: text/markdown

@@ -18,5 +18,4 @@ README.md

selfies/utils/encoding_utils.py
selfies/utils/linked_list.py
selfies/utils/matching_utils.py
selfies/utils/selfies_utils.py
selfies/utils/smiles_utils.py
import warnings
from typing import List, Union, Tuple

@@ -14,3 +15,3 @@ from selfies.compatibility import modernize_symbol

)
from selfies.mol_graph import MolecularGraph
from selfies.mol_graph import MolecularGraph, Attribution
from selfies.utils.selfies_utils import split_selfies

@@ -20,3 +21,7 @@ from selfies.utils.smiles_utils import mol_to_smiles

def decoder(selfies: str, compatible: bool = False) -> str:
def decoder(
selfies: str,
compatible: bool = False,
attribute: bool = False) ->\
Union[str, Tuple[str, List[Tuple[str, List[Tuple[int, str]]]]]]:
"""Translates a SELFIES string into its corresponding SMILES string.

@@ -35,2 +40,4 @@

Defaults to ``False``.
:param attribute: if ``True``, an attribution map connecting selfies
tokens to smiles tokens is output.
:return: a SMILES string derived from the input SELFIES string.

@@ -51,8 +58,9 @@ :raises DecoderError: if the input SELFIES string is malformed.

mol = MolecularGraph()
mol = MolecularGraph(attributable=attribute)
rings = []
attribution_index = 0
for s in selfies.split("."):
_derive_mol_from_symbols(
symbol_iter=_tokenize_selfies(s, compatible),
n = _derive_mol_from_symbols(
symbol_iter=enumerate(_tokenize_selfies(s, compatible)),
mol=mol,

@@ -63,6 +71,9 @@ selfies=selfies,

root_atom=None,
rings=rings
rings=rings,
attribute_stack=[] if attribute else None,
attribution_index=attribution_index
)
attribution_index += n
_form_rings_bilocally(mol, rings)
return mol_to_smiles(mol)
return mol_to_smiles(mol, attribute)

@@ -91,3 +102,3 @@

symbol_iter, mol, selfies, max_derive,
init_state, root_atom, rings
init_state, root_atom, rings, attribute_stack, attribution_index
):

@@ -101,3 +112,3 @@ n_derived = 0

try: # retrieve next symbol
symbol = next(symbol_iter)
index, symbol = next(symbol_iter)
n_derived += 1

@@ -123,3 +134,7 @@ except StopIteration:

symbol_iter, mol, selfies, (Q + 1),
init_state=binit_state, root_atom=prev_atom, rings=rings
init_state=binit_state, root_atom=prev_atom, rings=rings,
attribute_stack=attribute_stack +
[Attribution(index + attribution_index, symbol)
] if attribute_stack is not None else None,
attribution_index=attribution_index
)

@@ -162,7 +177,20 @@

if state == 0:
mol.add_atom(atom, True)
o = mol.add_atom(atom, True)
mol.add_attribution(
o, attribute_stack +
[Attribution(index + attribution_index, symbol)]
if attribute_stack is not None else None)
else:
mol.add_atom(atom)
o = mol.add_atom(atom)
mol.add_attribution(
o, attribute_stack +
[Attribution(index + attribution_index, symbol)]
if attribute_stack is not None else None)
src, dst = prev_atom.index, atom.index
mol.add_bond(src=src, dst=dst, order=bond_order, stereo=stereo)
o = mol.add_bond(src=src, dst=dst,
order=bond_order, stereo=stereo)
mol.add_attribution(
o, attribute_stack +
[Attribution(index + attribution_index, symbol)]
if attribute_stack is not None else None)
prev_atom = atom

@@ -195,3 +223,3 @@

try:
index_symbols.append(next(symbol_iter))
index_symbols.append(next(symbol_iter)[-1])
except StopIteration:

@@ -198,0 +226,0 @@ index_symbols.append(None)

from selfies.exceptions import EncoderError, SMILESParserError
from selfies.grammar_rules import get_selfies_from_index
from selfies.utils.linked_list import SinglyLinkedList
from selfies.utils.smiles_utils import (

@@ -10,4 +9,6 @@ atom_to_smiles,

from selfies.mol_graph import AttributionMap
def encoder(smiles: str, strict: bool = True) -> str:
def encoder(smiles: str, strict: bool = True, attribute: bool = False) -> str:
"""Translates a SMILES string into its corresponding SELFIES string.

@@ -37,3 +38,6 @@

Defaults to ``True``.
:return: a SELFIES string translated from the input SMILES string.
:param attribute: if an attribution should be returned
:return: a SELFIES string translated from the input SMILES string if
attribute is ``False``, otherwise a tuple is returned of
SELFIES string and attribution list.
:raises EncoderError: if the input SMILES string is invalid,

@@ -63,3 +67,3 @@ cannot be kekulized, or violates the semantic constraints with

try:
mol = smiles_to_mol(smiles)
mol = smiles_to_mol(smiles, attributable=attribute)
except SMILESParserError as err:

@@ -85,6 +89,13 @@ err_msg = "failed to parse input\n\tSMILES: {}".format(smiles)

fragments = []
attribution_maps = []
attribution_index = 0
for root in mol.get_roots():
derived = list(_fragment_to_selfies(mol, None, root))
derived = list(_fragment_to_selfies(
mol, None, root, attribution_maps, attribution_index))
attribution_index += len(derived)
fragments.append("".join(derived))
return ".".join(fragments)
# trim attribution map of empty tokens
attribution_maps = [a for a in attribution_maps if a.token]
result = ".".join(fragments), attribution_maps
return result if attribute else result[0]

@@ -137,4 +148,5 @@

def _fragment_to_selfies(mol, bond_into_root, root):
derived = SinglyLinkedList()
def _fragment_to_selfies(mol, bond_into_root, root,
attribution_maps, attribution_index=0):
derived = []

@@ -144,4 +156,9 @@ bond_into_curr, curr = bond_into_root, root

curr_atom = mol.get_atom(curr)
derived.append(_atom_to_selfies(bond_into_curr, curr_atom))
token = _atom_to_selfies(bond_into_curr, curr_atom)
derived.append(token)
attribution_maps.append(AttributionMap(
len(derived) - 1 + attribution_index,
token, mol.get_attribution(curr_atom)))
out_bonds = mol.get_out_dirbonds(curr)

@@ -163,4 +180,10 @@ for i, bond in enumerate(out_bonds):

derived.append(ring_symbol)
attribution_maps.append(AttributionMap(
len(derived) - 1 + attribution_index,
ring_symbol, mol.get_attribution(bond)))
for symbol in Q_as_symbols:
derived.append(symbol)
attribution_maps.append(AttributionMap(
len(derived) - 1 + attribution_index,
symbol, mol.get_attribution(bond)))

@@ -171,3 +194,4 @@ elif i == len(out_bonds) - 1:

else:
branch = _fragment_to_selfies(mol, bond, bond.dst)
branch = _fragment_to_selfies(
mol, bond, bond.dst, attribution_maps, len(derived))
Q_as_symbols = get_selfies_from_index(len(branch) - 1)

@@ -180,4 +204,10 @@ branch_symbol = "[{}Branch{}]".format(

derived.append(branch_symbol)
attribution_maps.append(AttributionMap(
len(derived) - 1 + attribution_index,
branch_symbol, mol.get_attribution(bond)))
for symbol in Q_as_symbols:
derived.append(symbol)
attribution_maps.append(AttributionMap(
len(derived) - 1 + attribution_index,
symbol, mol.get_attribution(bond)))
derived.extend(branch)

@@ -184,0 +214,0 @@

import functools
import itertools
from typing import List, Optional, Union
from dataclasses import dataclass, field

@@ -10,2 +11,25 @@ from selfies.bond_constraints import get_bonding_capacity

@dataclass
class Attribution:
"""A dataclass that contains token string and its index.
"""
#: token index
index: int
#: token string
token: str
@dataclass
class AttributionMap:
"""A mapping from input to single output token showing which
input tokens created the output token.
"""
#: Index of output token
index: int
#: Output token
token: str
#: List of input tokens that created the output token
attribution: List[Attribution] = field(default_factory=list)
class Atom:

@@ -74,3 +98,3 @@ """An atom with associated specifications (e.g. charge, chirality).

def __init__(self):
def __init__(self, attributable=False):
self._roots = list() # stores root atoms, where traversal begins

@@ -83,2 +107,4 @@ self._atoms = list() # stores atoms in this graph

self._delocal_subgraph = dict() # delocalization subgraph
self._attribution = dict() # attribution of each atom/bond
self._attributable = attributable

@@ -96,2 +122,10 @@ def __len__(self):

def get_attribution(
self,
o: Union[DirectedBond, Atom]
) -> List[Attribution]:
if self._attributable and o in self._attribution:
return self._attribution[o]
return None
def get_roots(self) -> List[int]:

@@ -115,3 +149,3 @@ return self._roots

def add_atom(self, atom: Atom, mark_root: bool = False) -> None:
def add_atom(self, atom: Atom, mark_root: bool = False) -> Atom:
atom.index = len(self)

@@ -127,7 +161,19 @@

self._delocal_subgraph[atom.index] = list()
return atom
def add_attribution(
self,
o: Union[DirectedBond, Atom],
attr: List[Attribution]
) -> None:
if self._attributable:
if o in self._attribution:
self._attribution[o].extend(attr)
else:
self._attribution[o] = attr
def add_bond(
self, src: int, dst: int,
order: Union[int, float], stereo: str
) -> None:
) -> DirectedBond:
assert src < dst

@@ -143,2 +189,3 @@

self._delocal_subgraph.setdefault(dst, []).append(src)
return bond

@@ -145,0 +192,0 @@ def add_placeholder_bond(self, src: int) -> int:

import enum
import re
from collections import deque
from typing import Iterator, Optional, Tuple, Union
from typing import Iterator, Optional, Tuple, Union, List
from selfies.constants import AROMATIC_SUBSET, ELEMENTS, ORGANIC_SUBSET
from selfies.exceptions import SMILESParserError
from selfies.mol_graph import Atom, DirectedBond, MolecularGraph
from selfies.mol_graph import Atom, Attribution, \
AttributionMap, DirectedBond, MolecularGraph

@@ -40,3 +41,4 @@ SMILES_BRACKETED_ATOM_PATTERN = re.compile(

bond_idx: Optional[int],
start_idx: int, end_idx: int, token_type: SMILESTokenTypes
start_idx: int, end_idx: int, token_type: SMILESTokenTypes,
token: str
):

@@ -47,2 +49,3 @@ self.bond_idx = bond_idx

self.token_type = token_type
self.token = token

@@ -55,3 +58,6 @@ def extract_bond_char(self, smiles):

def __str__(self):
return self.token
def tokenize_smiles(smiles: str) -> Iterator[SMILESToken]:

@@ -68,3 +74,3 @@ """Splits a SMILES string into its tokens.

if smiles[i] == ".":
yield SMILESToken(None, i, i + 1, SMILESTokenTypes.DOT)
yield SMILESToken(None, i, i + 1, SMILESTokenTypes.DOT, smiles[i])
i += 1

@@ -84,5 +90,7 @@ continue

if smiles[i: i + 2] in ("Br", "Cl"): # two-letter elements
token = SMILESToken(bond_idx, i, i + 2, SMILESTokenTypes.ATOM)
token = SMILESToken(bond_idx, i, i + 2,
SMILESTokenTypes.ATOM, smiles[i: i + 2])
else: # one-letter elements (e.g. C, N, ...)
token = SMILESToken(bond_idx, i, i + 1, SMILESTokenTypes.ATOM)
token = SMILESToken(bond_idx, i, i + 1,
SMILESTokenTypes.ATOM, smiles[i:i + 1])

@@ -93,3 +101,4 @@ elif smiles[i] == "[": # atoms encased in brackets (e.g. [NH])

raise SMILESParserError(smiles, "hanging bracket [", i)
token = SMILESToken(bond_idx, i, r_idx + 1, SMILESTokenTypes.ATOM)
token = SMILESToken(bond_idx, i, r_idx + 1,
SMILESTokenTypes.ATOM, smiles[i:r_idx + 1])

@@ -99,6 +108,8 @@ elif smiles[i] in ("(", ")"): # open and closed branch brackets

raise SMILESParserError(smiles, "hanging_bond", bond_idx)
token = SMILESToken(None, i, i + 1, SMILESTokenTypes.BRANCH)
token = SMILESToken(
None, i, i + 1, SMILESTokenTypes.BRANCH, smiles[i:i+1])
elif smiles[i].isdigit(): # one-digit ring number
token = SMILESToken(bond_idx, i, i + 1, SMILESTokenTypes.RING)
token = SMILESToken(bond_idx, i, i + 1,
SMILESTokenTypes.RING, smiles[i:i+1])

@@ -110,3 +121,4 @@ elif smiles[i] == "%": # two-digit ring number (e.g. %12)

raise SMILESParserError(smiles, err_msg, i)
token = SMILESToken(bond_idx, i, i + 3, SMILESTokenTypes.RING)
token = SMILESToken(bond_idx, i, i + 3,
SMILESTokenTypes.RING, smiles[i:i+3])

@@ -197,6 +209,7 @@ else:

def smiles_to_mol(smiles: str) -> MolecularGraph:
def smiles_to_mol(smiles: str, attributable: bool) -> MolecularGraph:
"""Reads a molecular graph from a SMILES string.
:param smiles: the input SMILES string.
:param attributable: if molecular graph needs to include attributions
:return: a molecular graph that the input SMILES string represents.

@@ -209,10 +222,11 @@ :raises SMILESParserError: if the input SMILES is invalid.

mol = MolecularGraph()
mol = MolecularGraph(attributable=attributable)
tokens = deque(tokenize_smiles(smiles))
i = 0
while tokens:
_derive_mol_from_tokens(mol, smiles, tokens)
i = _derive_mol_from_tokens(mol, smiles, tokens, i)
return mol
def _derive_mol_from_tokens(mol, smiles, tokens):
def _derive_mol_from_tokens(mol, smiles, tokens, i):
tok = None

@@ -240,3 +254,3 @@ prev_stack = deque() # keep track of previous atom on the current chain

curr = _attach_atom(mol, bond_char, curr, prev_atom)
curr, i = _attach_atom(mol, bond_char, curr, prev_atom, i, tok)
prev_stack.pop()

@@ -277,2 +291,3 @@ prev_stack.append(curr)

raise Exception("invalid symbol type")
i += 1

@@ -291,8 +306,11 @@ if len(mol) == 0:

raise SMILESParserError(smiles, err_msg, tok.start_idx)
return i
def _attach_atom(mol, bond_char, atom, prev_atom):
def _attach_atom(mol, bond_char, atom, prev_atom, i, tok):
is_root = (prev_atom is None)
mol.add_atom(atom, mark_root=is_root)
if bond_char:
i += 1
o = mol.add_atom(atom, mark_root=is_root)
mol.add_attribution(o, [Attribution(i, str(tok))])
if not is_root:

@@ -303,4 +321,5 @@ src, dst = prev_atom.index, atom.index

order = 1.5 # handle implicit aromatic bonds, e.g. cc
mol.add_bond(src=src, dst=dst, order=order, stereo=stereo)
return atom
o = mol.add_bond(src=src, dst=dst, order=order, stereo=stereo)
mol.add_attribution(o, [Attribution(i, str(tok))])
return atom, i

@@ -399,3 +418,6 @@

def mol_to_smiles(mol: MolecularGraph) -> str:
def mol_to_smiles(
mol: MolecularGraph,
attribute: bool = False
) -> Union[str, Tuple[str, List[Tuple[str, List[Tuple[int, str]]]]]]:
"""Converts a molecular graph into its SMILES representation, maintaining

@@ -405,3 +427,6 @@ the traversal order indicated by the input graph.

:param mol: the input molecule.
:return: a SMILES string representing the input molecule.
:param attribute: if an attribution should be returned
:return: a SMILES string representing the input molecule if
attribute is ``False``, otherwise a tuple is returned of
SMILES string and attribution list.
"""

@@ -411,13 +436,29 @@ assert mol.is_kekulized()

fragments = []
attribution_maps = []
attribution_index = 0
ring_log = dict()
for root in mol.get_roots():
derived = []
_derive_smiles_from_fragment(derived, mol, root, ring_log)
_derive_smiles_from_fragment(
derived, mol, root, ring_log, attribution_maps, attribution_index)
attribution_index += len(derived)
fragments.append("".join(derived))
return ".".join(fragments)
# trim attribution map of empty tokens
attribution_maps = [a for a in attribution_maps if a.token]
result = ".".join(fragments), attribution_maps
return result if attribute else result[0]
def _derive_smiles_from_fragment(derived, mol, root, ring_log):
def _derive_smiles_from_fragment(
derived,
mol,
root,
ring_log,
attribution_maps, attribution_index=0):
curr_atom, curr = mol.get_atom(root), root
derived.append(atom_to_smiles(curr_atom))
token = atom_to_smiles(curr_atom)
derived.append(token)
attribution_maps.append(AttributionMap(
len(derived) - 1 + attribution_index,
token, mol.get_attribution(curr_atom)))

@@ -427,3 +468,7 @@ out_bonds = mol.get_out_dirbonds(curr)

if bond.ring_bond:
derived.append(bond_to_smiles(bond))
token = bond_to_smiles(bond)
derived.append(token)
attribution_maps.append(AttributionMap(
len(derived) - 1 + attribution_index,
token, mol.get_attribution(bond)))
ends = (min(bond.src, bond.dst), max(bond.src, bond.dst))

@@ -439,6 +484,12 @@ rnum = ring_log.setdefault(ends, len(ring_log) + 1)

derived.append(bond_to_smiles(bond))
_derive_smiles_from_fragment(derived, mol, bond.dst, ring_log)
token = bond_to_smiles(bond)
derived.append(token)
attribution_maps.append(AttributionMap(
len(derived) - 1 + attribution_index,
token, mol.get_attribution(bond)))
_derive_smiles_from_fragment(
derived, mol, bond.dst, ring_log,
attribution_maps, attribution_index)
if i < len(out_bonds) - 1:
derived.append(")")
return attribution_maps

@@ -10,3 +10,3 @@ #!/usr/bin/env python

name="selfies",
version="2.0.0",
version="2.1.0",
author="Mario Krenn, Alston Lo, and many other contributors",

@@ -23,6 +23,6 @@ author_email="mario.krenn@utoronto.ca, alan@aspuru.com",

"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3 :: Only",

@@ -32,3 +32,3 @@ "License :: OSI Approved :: Apache Software License",

],
python_requires='>=3.5'
python_requires='>=3.7'
)
from typing import Any
class SinglyLinkedList:
"""A simple singly linked list that supports O(1) append and O(1) extend.
"""
def __init__(self):
self._head = None
self._tail = None
self._count = 0
def __len__(self):
return self._count
def __iter__(self):
return SinglyLinkedListIterator(self)
@property
def head(self):
return self._head
def append(self, item: Any) -> None:
node = [item, None]
if self._head is None:
self._head = node
self._tail = node
else:
self._tail[1] = node
self._tail = node
self._count += 1
def extend(self, other) -> None:
assert isinstance(other, SinglyLinkedList)
if other._head is None:
return
if self._head is None:
self._head = other._head
self._tail = other._tail
else:
self._tail[1] = other._head
self._tail = other._tail
self._count += len(other)
class SinglyLinkedListIterator:
def __init__(self, linked_list):
self._curr = linked_list.head
def __iter__(self):
return self
def __next__(self):
if self._curr is None:
raise StopIteration
else:
item = self._curr[0]
self._curr = self._curr[1]
return item