selfies - npm Package Compare versions

+81

-47

PKG-INFO

		Metadata-Version: 2.1
		Name: selfies
		Version: 2.0.0
		Version: 2.1.0
		Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs.
		@@ -25,3 +25,4 @@ Home-page: https://github.com/aspuru-guzik-group/selfies
		[Blog explaining SELFIES in Japanese language](https://blacktanktop.hatenablog.com/entry/2021/08/12/115613)\
		Major contributors since v1.0.0: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
		Major contributors of v1.0.n: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
		Main developer of v2.0.0: _[Alston Lo](https://github.com/alstonlo)_\
		Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)
		@@ -47,3 +48,3 @@
		To check if the correct version of ``selfies`` is installed, use
		the following pip command.
		the following pip command.

		@@ -54,9 +55,9 @@ ```bash

		To upgrade to the latest release of ``selfies`` if you are using an
		older version, use the following pip command. Please see the
		[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
		to review the changes between versions of `selfies`, before upgrading:
		To upgrade to the latest release of ``selfies`` if you are using an
		older version, use the following pip command. Please see the
		[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
		to review the changes between versions of `selfies`, before upgrading:

		```bash
		pip install selfies --upgrade
		pip install selfies --upgrade
		```
		@@ -70,16 +71,16 @@
		Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
		which contains a thorough tutorial for getting started with ``selfies``
		which contains a thorough tutorial for getting started with ``selfies``
		and detailed descriptions of the functions
		that ``selfies`` provides. We summarize some key functions below.

		\| Function \| Description \|
		\| -------- \| ----------- \|
		\| ``selfies.encoder`` \| Translates a SMILES string into its corresponding SELFIES string. \|
		\| ``selfies.decoder`` \| Translates a SELFIES string into its corresponding SMILES string. \|
		\| ``selfies.set_semantic_constraints`` \| Configures the semantic constraints that ``selfies`` operates on. \|
		\| ``selfies.len_selfies`` \| Returns the number of symbols in a SELFIES string. \|
		\| ``selfies.split_selfies`` \| Tokenizes a SELFIES string into its individual symbols. \|
		\| ``selfies.get_alphabet_from_selfies`` \| Constructs an alphabet from an iterable of SELFIES strings. \|
		\| ``selfies.selfies_to_encoding`` \| Converts a SELFIES string into its label and/or one-hot encoding. \|
		\| ``selfies.encoding_to_selfies`` \| Converts a label or one-hot encoding into a SELFIES string. \|
		\| Function \| Description \|
		\| ------------------------------------- \| ----------------------------------------------------------------- \|
		\| ``selfies.encoder`` \| Translates a SMILES string into its corresponding SELFIES string. \|
		\| ``selfies.decoder`` \| Translates a SELFIES string into its corresponding SMILES string. \|
		\| ``selfies.set_semantic_constraints`` \| Configures the semantic constraints that ``selfies`` operates on. \|
		\| ``selfies.len_selfies`` \| Returns the number of symbols in a SELFIES string. \|
		\| ``selfies.split_selfies`` \| Tokenizes a SELFIES string into its individual symbols. \|
		\| ``selfies.get_alphabet_from_selfies`` \| Constructs an alphabet from an iterable of SELFIES strings. \|
		\| ``selfies.selfies_to_encoding`` \| Converts a SELFIES string into its label and/or one-hot encoding. \|
		\| ``selfies.encoding_to_selfies`` \| Converts a label or one-hot encoding into a SELFIES string. \|

		@@ -111,24 +112,6 @@

		#### Customizing SELFIES:

		In this example, we relax the semantic constraints of ``selfies`` to allow
		for hypervalences (caution: hypervalence rules are much less understood
		than octet rules. Some molecules containing hypervalences are important,
		but generally, it is not known which molecules are stable and reasonable).

		```python
		import selfies as sf

		hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
		standard_derived_smi = sf.decoder(hypervalent_sf)
		# OI (the default constraints for I allows for only 1 bond)

		sf.set_semantic_constraints("hypervalent")
		relaxed_derived_smi = sf.decoder(hypervalent_sf)
		# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
		```

		#### Integer and one-hot encoding SELFIES:

		In this example, we first build an alphabet from a dataset of SELFIES strings,
		In this example, we first build an alphabet from a dataset of SELFIES strings,
		and then convert a SELFIES string into its padded encoding. Note that we use the
		@@ -163,2 +146,52 @@ ``[nop]`` ([no operation](https://en.wikipedia.org/wiki/NOP_(code) ))

		#### Customizing SELFIES:

		In this example, we relax the semantic constraints of ``selfies`` to allow
		for hypervalences (caution: hypervalence rules are much less understood
		than octet rules. Some molecules containing hypervalences are important,
		but generally, it is not known which molecules are stable and reasonable).

		```python
		import selfies as sf

		hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
		standard_derived_smi = sf.decoder(hypervalent_sf)
		# OI (the default constraints for I allows for only 1 bond)

		sf.set_semantic_constraints("hypervalent")
		relaxed_derived_smi = sf.decoder(hypervalent_sf)
		# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
		```

		#### Explaining Translation:

		You can get an "attribution" list that traces the connection between input and output tokens. For example let's see which tokens in the SELFIES string ``[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]`` are responsible for the output SMILES tokens.

		```python
		selfies = "[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]"
		smiles, attr = sf.decoder(
		selfies, attribute=True)
		print('SELFIES', selfies)
		print('SMILES', smiles)
		print('Attribution:')
		for smiles_token, a in attr:
		print(smiles_token)
		if a:
		for j, selfies_token in a:
		print(f'\t{j}:{selfies_token}')

		# output
		SELFIES [C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]
		SMILES C1NC(P)CC1
		Attribution:
		AttributionMap(index=0, token='C', attribution=[Attribution(index=0, token='[C]')])
		AttributionMap(index=4, token='N', attribution=[Attribution(index=1, token='[N]')])
		AttributionMap(index=6, token='C', attribution=[Attribution(index=2, token='[C]')])
		AttributionMap(index=9, token='P', attribution=[Attribution(index=3, token='[Branch1]'), Attribution(index=5, token='[P]')])
		AttributionMap(index=12, token='C', attribution=[Attribution(index=6, token='[C]')])
		AttributionMap(index=14, token='C', attribution=[Attribution(index=7, token='[C]')])
		```

		``attr`` is a list of `AttributionMap`s containing the output token, its index, and input tokens that led to it. For example, the ``P`` appearing in the output SMILES at that location is a result of both the ``[Branch1]`` token at position 3 and the ``[P]`` token at index 5. This works for both encoding and decoding. For finer control of tracking the translation (like tracking rings), you can access attributions in the underlying molecular graph with ``get_attribution``.

		### More Usages and Examples
		@@ -173,4 +206,5 @@
		* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
		* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
		* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.
		* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
		* Nathan Frey, Vijay Gadepally, and Bharath Ramsundar used SELFIES with normalizing flows to develop the [FastFlows](https://arxiv.org/abs/2201.12419) framework for deep chemical generative modeling.
		* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.

		@@ -180,3 +214,3 @@ ## Tests
		All tests can be found in the `tests/` directory. To run the test suite for
		SELFIES, install ``tox`` and run:
		SELFIES, install ``tox`` and run:

		@@ -195,5 +229,5 @@ ```bash
		* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
		Due to its large size, this dataset is not included on the repository. To run tests
		on it, please download the dataset into the ``tests/test_sets`` directory
		and run the ``tests/run_on_large_dataset.py`` script.
		Due to its large size, this dataset is not included on the repository. To run tests
		on it, please download the dataset into the ``tests/test_sets`` directory
		and run the ``tests/run_on_large_dataset.py`` script.

		@@ -216,10 +250,10 @@ ## Version History
		Classifier: Programming Language :: Python :: 3
		Classifier: Programming Language :: Python :: 3.5
		Classifier: Programming Language :: Python :: 3.6
		Classifier: Programming Language :: Python :: 3.7
		Classifier: Programming Language :: Python :: 3.8
		Classifier: Programming Language :: Python :: 3.9
		Classifier: Programming Language :: Python :: 3.10
		Classifier: Programming Language :: Python :: 3 :: Only
		Classifier: License :: OSI Approved :: Apache Software License
		Classifier: Operating System :: OS Independent
		Requires-Python: >=3.5
		Requires-Python: >=3.7
		Description-Content-Type: text/markdown

+77

-43

README.md

		@@ -17,3 +17,4 @@ # SELFIES
		[Blog explaining SELFIES in Japanese language](https://blacktanktop.hatenablog.com/entry/2021/08/12/115613)\
		Major contributors since v1.0.0: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
		Major contributors of v1.0.n: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
		Main developer of v2.0.0: _[Alston Lo](https://github.com/alstonlo)_\
		Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)
		@@ -39,3 +40,3 @@
		To check if the correct version of ``selfies`` is installed, use
		the following pip command.
		the following pip command.

		@@ -46,9 +47,9 @@ ```bash

		To upgrade to the latest release of ``selfies`` if you are using an
		older version, use the following pip command. Please see the
		[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
		to review the changes between versions of `selfies`, before upgrading:
		To upgrade to the latest release of ``selfies`` if you are using an
		older version, use the following pip command. Please see the
		[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
		to review the changes between versions of `selfies`, before upgrading:

		```bash
		pip install selfies --upgrade
		pip install selfies --upgrade
		```
		@@ -62,16 +63,16 @@
		Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
		which contains a thorough tutorial for getting started with ``selfies``
		which contains a thorough tutorial for getting started with ``selfies``
		and detailed descriptions of the functions
		that ``selfies`` provides. We summarize some key functions below.

		\| Function \| Description \|
		\| -------- \| ----------- \|
		\| ``selfies.encoder`` \| Translates a SMILES string into its corresponding SELFIES string. \|
		\| ``selfies.decoder`` \| Translates a SELFIES string into its corresponding SMILES string. \|
		\| ``selfies.set_semantic_constraints`` \| Configures the semantic constraints that ``selfies`` operates on. \|
		\| ``selfies.len_selfies`` \| Returns the number of symbols in a SELFIES string. \|
		\| ``selfies.split_selfies`` \| Tokenizes a SELFIES string into its individual symbols. \|
		\| ``selfies.get_alphabet_from_selfies`` \| Constructs an alphabet from an iterable of SELFIES strings. \|
		\| ``selfies.selfies_to_encoding`` \| Converts a SELFIES string into its label and/or one-hot encoding. \|
		\| ``selfies.encoding_to_selfies`` \| Converts a label or one-hot encoding into a SELFIES string. \|
		\| Function \| Description \|
		\| ------------------------------------- \| ----------------------------------------------------------------- \|
		\| ``selfies.encoder`` \| Translates a SMILES string into its corresponding SELFIES string. \|
		\| ``selfies.decoder`` \| Translates a SELFIES string into its corresponding SMILES string. \|
		\| ``selfies.set_semantic_constraints`` \| Configures the semantic constraints that ``selfies`` operates on. \|
		\| ``selfies.len_selfies`` \| Returns the number of symbols in a SELFIES string. \|
		\| ``selfies.split_selfies`` \| Tokenizes a SELFIES string into its individual symbols. \|
		\| ``selfies.get_alphabet_from_selfies`` \| Constructs an alphabet from an iterable of SELFIES strings. \|
		\| ``selfies.selfies_to_encoding`` \| Converts a SELFIES string into its label and/or one-hot encoding. \|
		\| ``selfies.encoding_to_selfies`` \| Converts a label or one-hot encoding into a SELFIES string. \|

		@@ -103,24 +104,6 @@

		#### Customizing SELFIES:

		In this example, we relax the semantic constraints of ``selfies`` to allow
		for hypervalences (caution: hypervalence rules are much less understood
		than octet rules. Some molecules containing hypervalences are important,
		but generally, it is not known which molecules are stable and reasonable).

		```python
		import selfies as sf

		hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
		standard_derived_smi = sf.decoder(hypervalent_sf)
		# OI (the default constraints for I allows for only 1 bond)

		sf.set_semantic_constraints("hypervalent")
		relaxed_derived_smi = sf.decoder(hypervalent_sf)
		# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
		```

		#### Integer and one-hot encoding SELFIES:

		In this example, we first build an alphabet from a dataset of SELFIES strings,
		In this example, we first build an alphabet from a dataset of SELFIES strings,
		and then convert a SELFIES string into its padded encoding. Note that we use the
		@@ -155,2 +138,52 @@ ``[nop]`` ([no operation](https://en.wikipedia.org/wiki/NOP_(code) ))

		#### Customizing SELFIES:

		In this example, we relax the semantic constraints of ``selfies`` to allow
		for hypervalences (caution: hypervalence rules are much less understood
		than octet rules. Some molecules containing hypervalences are important,
		but generally, it is not known which molecules are stable and reasonable).

		```python
		import selfies as sf

		hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
		standard_derived_smi = sf.decoder(hypervalent_sf)
		# OI (the default constraints for I allows for only 1 bond)

		sf.set_semantic_constraints("hypervalent")
		relaxed_derived_smi = sf.decoder(hypervalent_sf)
		# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
		```

		#### Explaining Translation:

		You can get an "attribution" list that traces the connection between input and output tokens. For example let's see which tokens in the SELFIES string ``[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]`` are responsible for the output SMILES tokens.

		```python
		selfies = "[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]"
		smiles, attr = sf.decoder(
		selfies, attribute=True)
		print('SELFIES', selfies)
		print('SMILES', smiles)
		print('Attribution:')
		for smiles_token, a in attr:
		print(smiles_token)
		if a:
		for j, selfies_token in a:
		print(f'\t{j}:{selfies_token}')

		# output
		SELFIES [C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]
		SMILES C1NC(P)CC1
		Attribution:
		AttributionMap(index=0, token='C', attribution=[Attribution(index=0, token='[C]')])
		AttributionMap(index=4, token='N', attribution=[Attribution(index=1, token='[N]')])
		AttributionMap(index=6, token='C', attribution=[Attribution(index=2, token='[C]')])
		AttributionMap(index=9, token='P', attribution=[Attribution(index=3, token='[Branch1]'), Attribution(index=5, token='[P]')])
		AttributionMap(index=12, token='C', attribution=[Attribution(index=6, token='[C]')])
		AttributionMap(index=14, token='C', attribution=[Attribution(index=7, token='[C]')])
		```

		``attr`` is a list of `AttributionMap`s containing the output token, its index, and input tokens that led to it. For example, the ``P`` appearing in the output SMILES at that location is a result of both the ``[Branch1]`` token at position 3 and the ``[P]`` token at index 5. This works for both encoding and decoding. For finer control of tracking the translation (like tracking rings), you can access attributions in the underlying molecular graph with ``get_attribution``.

		### More Usages and Examples
		@@ -165,4 +198,5 @@
		* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
		* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
		* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.
		* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
		* Nathan Frey, Vijay Gadepally, and Bharath Ramsundar used SELFIES with normalizing flows to develop the [FastFlows](https://arxiv.org/abs/2201.12419) framework for deep chemical generative modeling.
		* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.

		@@ -172,3 +206,3 @@ ## Tests
		All tests can be found in the `tests/` directory. To run the test suite for
		SELFIES, install ``tox`` and run:
		SELFIES, install ``tox`` and run:

		@@ -187,5 +221,5 @@ ```bash
		* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
		Due to its large size, this dataset is not included on the repository. To run tests
		on it, please download the dataset into the ``tests/test_sets`` directory
		and run the ``tests/run_on_large_dataset.py`` script.
		Due to its large size, this dataset is not included on the repository. To run tests
		on it, please download the dataset into the ``tests/test_sets`` directory
		and run the ``tests/run_on_large_dataset.py`` script.

		@@ -192,0 +226,0 @@ ## Version History

+81

-47

selfies.egg-info/PKG-INFO

		Metadata-Version: 2.1
		Name: selfies
		Version: 2.0.0
		Version: 2.1.0
		Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs.
		@@ -25,3 +25,4 @@ Home-page: https://github.com/aspuru-guzik-group/selfies
		[Blog explaining SELFIES in Japanese language](https://blacktanktop.hatenablog.com/entry/2021/08/12/115613)\
		Major contributors since v1.0.0: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
		Major contributors of v1.0.n: _[Alston Lo](https://github.com/alstonlo) and [Seyone Chithrananda](https://github.com/seyonechithrananda)_\
		Main developer of v2.0.0: _[Alston Lo](https://github.com/alstonlo)_\
		Chemistry Advisor: [Robert Pollice](https://scholar.google.at/citations?user=JR2N3JIAAAAJ)
		@@ -47,3 +48,3 @@
		To check if the correct version of ``selfies`` is installed, use
		the following pip command.
		the following pip command.

		@@ -54,9 +55,9 @@ ```bash

		To upgrade to the latest release of ``selfies`` if you are using an
		older version, use the following pip command. Please see the
		[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
		to review the changes between versions of `selfies`, before upgrading:
		To upgrade to the latest release of ``selfies`` if you are using an
		older version, use the following pip command. Please see the
		[CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md)
		to review the changes between versions of `selfies`, before upgrading:

		```bash
		pip install selfies --upgrade
		pip install selfies --upgrade
		```
		@@ -70,16 +71,16 @@
		Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
		which contains a thorough tutorial for getting started with ``selfies``
		which contains a thorough tutorial for getting started with ``selfies``
		and detailed descriptions of the functions
		that ``selfies`` provides. We summarize some key functions below.

		\| Function \| Description \|
		\| -------- \| ----------- \|
		\| ``selfies.encoder`` \| Translates a SMILES string into its corresponding SELFIES string. \|
		\| ``selfies.decoder`` \| Translates a SELFIES string into its corresponding SMILES string. \|
		\| ``selfies.set_semantic_constraints`` \| Configures the semantic constraints that ``selfies`` operates on. \|
		\| ``selfies.len_selfies`` \| Returns the number of symbols in a SELFIES string. \|
		\| ``selfies.split_selfies`` \| Tokenizes a SELFIES string into its individual symbols. \|
		\| ``selfies.get_alphabet_from_selfies`` \| Constructs an alphabet from an iterable of SELFIES strings. \|
		\| ``selfies.selfies_to_encoding`` \| Converts a SELFIES string into its label and/or one-hot encoding. \|
		\| ``selfies.encoding_to_selfies`` \| Converts a label or one-hot encoding into a SELFIES string. \|
		\| Function \| Description \|
		\| ------------------------------------- \| ----------------------------------------------------------------- \|
		\| ``selfies.encoder`` \| Translates a SMILES string into its corresponding SELFIES string. \|
		\| ``selfies.decoder`` \| Translates a SELFIES string into its corresponding SMILES string. \|
		\| ``selfies.set_semantic_constraints`` \| Configures the semantic constraints that ``selfies`` operates on. \|
		\| ``selfies.len_selfies`` \| Returns the number of symbols in a SELFIES string. \|
		\| ``selfies.split_selfies`` \| Tokenizes a SELFIES string into its individual symbols. \|
		\| ``selfies.get_alphabet_from_selfies`` \| Constructs an alphabet from an iterable of SELFIES strings. \|
		\| ``selfies.selfies_to_encoding`` \| Converts a SELFIES string into its label and/or one-hot encoding. \|
		\| ``selfies.encoding_to_selfies`` \| Converts a label or one-hot encoding into a SELFIES string. \|

		@@ -111,24 +112,6 @@

		#### Customizing SELFIES:

		In this example, we relax the semantic constraints of ``selfies`` to allow
		for hypervalences (caution: hypervalence rules are much less understood
		than octet rules. Some molecules containing hypervalences are important,
		but generally, it is not known which molecules are stable and reasonable).

		```python
		import selfies as sf

		hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
		standard_derived_smi = sf.decoder(hypervalent_sf)
		# OI (the default constraints for I allows for only 1 bond)

		sf.set_semantic_constraints("hypervalent")
		relaxed_derived_smi = sf.decoder(hypervalent_sf)
		# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
		```

		#### Integer and one-hot encoding SELFIES:

		In this example, we first build an alphabet from a dataset of SELFIES strings,
		In this example, we first build an alphabet from a dataset of SELFIES strings,
		and then convert a SELFIES string into its padded encoding. Note that we use the
		@@ -163,2 +146,52 @@ ``[nop]`` ([no operation](https://en.wikipedia.org/wiki/NOP_(code) ))

		#### Customizing SELFIES:

		In this example, we relax the semantic constraints of ``selfies`` to allow
		for hypervalences (caution: hypervalence rules are much less understood
		than octet rules. Some molecules containing hypervalences are important,
		but generally, it is not known which molecules are stable and reasonable).

		```python
		import selfies as sf

		hypervalent_sf = sf.encoder('O=I(O)(O)(O)(O)O', strict=False) # orthoperiodic acid
		standard_derived_smi = sf.decoder(hypervalent_sf)
		# OI (the default constraints for I allows for only 1 bond)

		sf.set_semantic_constraints("hypervalent")
		relaxed_derived_smi = sf.decoder(hypervalent_sf)
		# O=I(O)(O)(O)(O)O (the hypervalent constraints for I allows for 7 bonds)
		```

		#### Explaining Translation:

		You can get an "attribution" list that traces the connection between input and output tokens. For example let's see which tokens in the SELFIES string ``[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]`` are responsible for the output SMILES tokens.

		```python
		selfies = "[C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]"
		smiles, attr = sf.decoder(
		selfies, attribute=True)
		print('SELFIES', selfies)
		print('SMILES', smiles)
		print('Attribution:')
		for smiles_token, a in attr:
		print(smiles_token)
		if a:
		for j, selfies_token in a:
		print(f'\t{j}:{selfies_token}')

		# output
		SELFIES [C][N][C][Branch1][C][P][C][C][Ring1][=Branch1]
		SMILES C1NC(P)CC1
		Attribution:
		AttributionMap(index=0, token='C', attribution=[Attribution(index=0, token='[C]')])
		AttributionMap(index=4, token='N', attribution=[Attribution(index=1, token='[N]')])
		AttributionMap(index=6, token='C', attribution=[Attribution(index=2, token='[C]')])
		AttributionMap(index=9, token='P', attribution=[Attribution(index=3, token='[Branch1]'), Attribution(index=5, token='[P]')])
		AttributionMap(index=12, token='C', attribution=[Attribution(index=6, token='[C]')])
		AttributionMap(index=14, token='C', attribution=[Attribution(index=7, token='[C]')])
		```

		``attr`` is a list of `AttributionMap`s containing the output token, its index, and input tokens that led to it. For example, the ``P`` appearing in the output SMILES at that location is a result of both the ``[Branch1]`` token at position 3 and the ``[P]`` token at index 5. This works for both encoding and decoding. For finer control of tracking the translation (like tracking rings), you can access attributions in the underlying molecular graph with ``get_attribution``.

		### More Usages and Examples
		@@ -173,4 +206,5 @@
		* We use SELFIES for [Deep Molecular dreaming](https://arxiv.org/abs/2012.09712), a new generative model inspired by interpretable neural networks in computational vision. See the [code of PASITHEA here](https://github.com/aspuru-guzik-group/Pasithea).
		* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
		* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.
		* Kohulan Rajan, Achim Zielesny, Christoph Steinbeck show in two papers that SELFIES outperforms other representations in [img2string](https://link.springer.com/article/10.1186/s13321-020-00469-w) and [string2string](https://chemrxiv.org/articles/preprint/STOUT_SMILES_to_IUPAC_Names_Using_Neural_Machine_Translation/13469202/1) translation tasks, see the codes of [DECIMER](https://github.com/Kohulan/DECIMER-Image-to-SMILES) and [STOUT](https://github.com/Kohulan/Smiles-TO-iUpac-Translator).
		* Nathan Frey, Vijay Gadepally, and Bharath Ramsundar used SELFIES with normalizing flows to develop the [FastFlows](https://arxiv.org/abs/2201.12419) framework for deep chemical generative modeling.
		* An improvement to the old genetic algorithm, the authors have also released [JANUS](https://arxiv.org/abs/2106.04011), which allows for more efficient optimization in the chemical space. JANUS makes use of [STONED-SELFIES](https://pubs.rsc.org/en/content/articlepdf/2021/sc/d1sc00231g) and a neural network for efficient sampling.

		@@ -180,3 +214,3 @@ ## Tests
		All tests can be found in the `tests/` directory. To run the test suite for
		SELFIES, install ``tox`` and run:
		SELFIES, install ``tox`` and run:

		@@ -195,5 +229,5 @@ ```bash
		* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
		Due to its large size, this dataset is not included on the repository. To run tests
		on it, please download the dataset into the ``tests/test_sets`` directory
		and run the ``tests/run_on_large_dataset.py`` script.
		Due to its large size, this dataset is not included on the repository. To run tests
		on it, please download the dataset into the ``tests/test_sets`` directory
		and run the ``tests/run_on_large_dataset.py`` script.

		@@ -216,10 +250,10 @@ ## Version History
		Classifier: Programming Language :: Python :: 3
		Classifier: Programming Language :: Python :: 3.5
		Classifier: Programming Language :: Python :: 3.6
		Classifier: Programming Language :: Python :: 3.7
		Classifier: Programming Language :: Python :: 3.8
		Classifier: Programming Language :: Python :: 3.9
		Classifier: Programming Language :: Python :: 3.10
		Classifier: Programming Language :: Python :: 3 :: Only
		Classifier: License :: OSI Approved :: Apache Software License
		Classifier: Operating System :: OS Independent
		Requires-Python: >=3.5
		Requires-Python: >=3.7
		Description-Content-Type: text/markdown

+0

-1

selfies.egg-info/SOURCES.txt

		@@ -18,5 +18,4 @@ README.md
		selfies/utils/encoding_utils.py
		selfies/utils/linked_list.py
		selfies/utils/matching_utils.py
		selfies/utils/selfies_utils.py
		selfies/utils/smiles_utils.py

+42

-14

selfies/decoder.py

		import warnings
		from typing import List, Union, Tuple

		@@ -14,3 +15,3 @@ from selfies.compatibility import modernize_symbol
		)
		from selfies.mol_graph import MolecularGraph
		from selfies.mol_graph import MolecularGraph, Attribution
		from selfies.utils.selfies_utils import split_selfies
		@@ -20,3 +21,7 @@ from selfies.utils.smiles_utils import mol_to_smiles

		def decoder(selfies: str, compatible: bool = False) -> str:
		def decoder(
		selfies: str,
		compatible: bool = False,
		attribute: bool = False) ->\
		Union[str, Tuple[str, List[Tuple[str, List[Tuple[int, str]]]]]]:
		"""Translates a SELFIES string into its corresponding SMILES string.
		@@ -35,2 +40,4 @@
		Defaults to ``False``.
		:param attribute: if ``True``, an attribution map connecting selfies
		tokens to smiles tokens is output.
		:return: a SMILES string derived from the input SELFIES string.
		@@ -51,8 +58,9 @@ :raises DecoderError: if the input SELFIES string is malformed.

		mol = MolecularGraph()
		mol = MolecularGraph(attributable=attribute)

		rings = []
		attribution_index = 0
		for s in selfies.split("."):
		_derive_mol_from_symbols(
		symbol_iter=_tokenize_selfies(s, compatible),
		n = _derive_mol_from_symbols(
		symbol_iter=enumerate(_tokenize_selfies(s, compatible)),
		mol=mol,
		@@ -63,6 +71,9 @@ selfies=selfies,
		root_atom=None,
		rings=rings
		rings=rings,
		attribute_stack=[] if attribute else None,
		attribution_index=attribution_index
		)
		attribution_index += n
		_form_rings_bilocally(mol, rings)
		return mol_to_smiles(mol)
		return mol_to_smiles(mol, attribute)

		@@ -91,3 +102,3 @@
		symbol_iter, mol, selfies, max_derive,
		init_state, root_atom, rings
		init_state, root_atom, rings, attribute_stack, attribution_index
		):
		@@ -101,3 +112,3 @@ n_derived = 0
		try: # retrieve next symbol
		symbol = next(symbol_iter)
		index, symbol = next(symbol_iter)
		n_derived += 1
		@@ -123,3 +134,7 @@ except StopIteration:
		symbol_iter, mol, selfies, (Q + 1),
		init_state=binit_state, root_atom=prev_atom, rings=rings
		init_state=binit_state, root_atom=prev_atom, rings=rings,
		attribute_stack=attribute_stack +
		[Attribution(index + attribution_index, symbol)
		] if attribute_stack is not None else None,
		attribution_index=attribution_index
		)
		@@ -162,7 +177,20 @@
		if state == 0:
		mol.add_atom(atom, True)
		o = mol.add_atom(atom, True)
		mol.add_attribution(
		o, attribute_stack +
		[Attribution(index + attribution_index, symbol)]
		if attribute_stack is not None else None)
		else:
		mol.add_atom(atom)
		o = mol.add_atom(atom)
		mol.add_attribution(
		o, attribute_stack +
		[Attribution(index + attribution_index, symbol)]
		if attribute_stack is not None else None)
		src, dst = prev_atom.index, atom.index
		mol.add_bond(src=src, dst=dst, order=bond_order, stereo=stereo)
		o = mol.add_bond(src=src, dst=dst,
		order=bond_order, stereo=stereo)
		mol.add_attribution(
		o, attribute_stack +
		[Attribution(index + attribution_index, symbol)]
		if attribute_stack is not None else None)
		prev_atom = atom
		@@ -195,3 +223,3 @@
		try:
		index_symbols.append(next(symbol_iter))
		index_symbols.append(next(symbol_iter)[-1])
		except StopIteration:
		@@ -198,0 +226,0 @@ index_symbols.append(None)

+40

-10

selfies/encoder.py

		from selfies.exceptions import EncoderError, SMILESParserError
		from selfies.grammar_rules import get_selfies_from_index
		from selfies.utils.linked_list import SinglyLinkedList
		from selfies.utils.smiles_utils import (
		@@ -10,4 +9,6 @@ atom_to_smiles,

		from selfies.mol_graph import AttributionMap

		def encoder(smiles: str, strict: bool = True) -> str:

		def encoder(smiles: str, strict: bool = True, attribute: bool = False) -> str:
		"""Translates a SMILES string into its corresponding SELFIES string.
		@@ -37,3 +38,6 @@
		Defaults to ``True``.
		:return: a SELFIES string translated from the input SMILES string.
		:param attribute: if an attribution should be returned
		:return: a SELFIES string translated from the input SMILES string if
		attribute is ``False``, otherwise a tuple is returned of
		SELFIES string and attribution list.
		:raises EncoderError: if the input SMILES string is invalid,
		@@ -63,3 +67,3 @@ cannot be kekulized, or violates the semantic constraints with
		try:
		mol = smiles_to_mol(smiles)
		mol = smiles_to_mol(smiles, attributable=attribute)
		except SMILESParserError as err:
		@@ -85,6 +89,13 @@ err_msg = "failed to parse input\n\tSMILES: {}".format(smiles)
		fragments = []
		attribution_maps = []
		attribution_index = 0
		for root in mol.get_roots():
		derived = list(_fragment_to_selfies(mol, None, root))
		derived = list(_fragment_to_selfies(
		mol, None, root, attribution_maps, attribution_index))
		attribution_index += len(derived)
		fragments.append("".join(derived))
		return ".".join(fragments)
		# trim attribution map of empty tokens
		attribution_maps = [a for a in attribution_maps if a.token]
		result = ".".join(fragments), attribution_maps
		return result if attribute else result[0]

		@@ -137,4 +148,5 @@

		def _fragment_to_selfies(mol, bond_into_root, root):
		derived = SinglyLinkedList()
		def _fragment_to_selfies(mol, bond_into_root, root,
		attribution_maps, attribution_index=0):
		derived = []

		@@ -144,4 +156,9 @@ bond_into_curr, curr = bond_into_root, root
		curr_atom = mol.get_atom(curr)
		derived.append(_atom_to_selfies(bond_into_curr, curr_atom))
		token = _atom_to_selfies(bond_into_curr, curr_atom)
		derived.append(token)

		attribution_maps.append(AttributionMap(
		len(derived) - 1 + attribution_index,
		token, mol.get_attribution(curr_atom)))

		out_bonds = mol.get_out_dirbonds(curr)
		@@ -163,4 +180,10 @@ for i, bond in enumerate(out_bonds):
		derived.append(ring_symbol)
		attribution_maps.append(AttributionMap(
		len(derived) - 1 + attribution_index,
		ring_symbol, mol.get_attribution(bond)))
		for symbol in Q_as_symbols:
		derived.append(symbol)
		attribution_maps.append(AttributionMap(
		len(derived) - 1 + attribution_index,
		symbol, mol.get_attribution(bond)))

		@@ -171,3 +194,4 @@ elif i == len(out_bonds) - 1:
		else:
		branch = _fragment_to_selfies(mol, bond, bond.dst)
		branch = _fragment_to_selfies(
		mol, bond, bond.dst, attribution_maps, len(derived))
		Q_as_symbols = get_selfies_from_index(len(branch) - 1)
		@@ -180,4 +204,10 @@ branch_symbol = "[{}Branch{}]".format(
		derived.append(branch_symbol)
		attribution_maps.append(AttributionMap(
		len(derived) - 1 + attribution_index,
		branch_symbol, mol.get_attribution(bond)))
		for symbol in Q_as_symbols:
		derived.append(symbol)
		attribution_maps.append(AttributionMap(
		len(derived) - 1 + attribution_index,
		symbol, mol.get_attribution(bond)))
		derived.extend(branch)
		@@ -184,0 +214,0 @@

+50

-3

selfies/mol_graph.py

		import functools
		import itertools
		from typing import List, Optional, Union
		from dataclasses import dataclass, field

		@@ -10,2 +11,25 @@ from selfies.bond_constraints import get_bonding_capacity

		@dataclass
		class Attribution:
		"""A dataclass that contains token string and its index.
		"""
		#: token index
		index: int
		#: token string
		token: str


		@dataclass
		class AttributionMap:
		"""A mapping from input to single output token showing which
		input tokens created the output token.
		"""
		#: Index of output token
		index: int
		#: Output token
		token: str
		#: List of input tokens that created the output token
		attribution: List[Attribution] = field(default_factory=list)


		class Atom:
		@@ -74,3 +98,3 @@ """An atom with associated specifications (e.g. charge, chirality).

		def __init__(self):
		def __init__(self, attributable=False):
		self._roots = list() # stores root atoms, where traversal begins
		@@ -83,2 +107,4 @@ self._atoms = list() # stores atoms in this graph
		self._delocal_subgraph = dict() # delocalization subgraph
		self._attribution = dict() # attribution of each atom/bond
		self._attributable = attributable

		@@ -96,2 +122,10 @@ def __len__(self):

		def get_attribution(
		self,
		o: Union[DirectedBond, Atom]
		) -> List[Attribution]:
		if self._attributable and o in self._attribution:
		return self._attribution[o]
		return None

		def get_roots(self) -> List[int]:
		@@ -115,3 +149,3 @@ return self._roots

		def add_atom(self, atom: Atom, mark_root: bool = False) -> None:
		def add_atom(self, atom: Atom, mark_root: bool = False) -> Atom:
		atom.index = len(self)
		@@ -127,7 +161,19 @@
		self._delocal_subgraph[atom.index] = list()
		return atom

		def add_attribution(
		self,
		o: Union[DirectedBond, Atom],
		attr: List[Attribution]
		) -> None:
		if self._attributable:
		if o in self._attribution:
		self._attribution[o].extend(attr)
		else:
		self._attribution[o] = attr

		def add_bond(
		self, src: int, dst: int,
		order: Union[int, float], stereo: str
		) -> None:
		) -> DirectedBond:
		assert src < dst
		@@ -143,2 +189,3 @@
		self._delocal_subgraph.setdefault(dst, []).append(src)
		return bond

		@@ -145,0 +192,0 @@ def add_placeholder_bond(self, src: int) -> int:

+81

-30

selfies/utils/smiles_utils.py

		import enum
		import re
		from collections import deque
		from typing import Iterator, Optional, Tuple, Union
		from typing import Iterator, Optional, Tuple, Union, List

		from selfies.constants import AROMATIC_SUBSET, ELEMENTS, ORGANIC_SUBSET
		from selfies.exceptions import SMILESParserError
		from selfies.mol_graph import Atom, DirectedBond, MolecularGraph
		from selfies.mol_graph import Atom, Attribution, \
		AttributionMap, DirectedBond, MolecularGraph

		@@ -40,3 +41,4 @@ SMILES_BRACKETED_ATOM_PATTERN = re.compile(
		bond_idx: Optional[int],
		start_idx: int, end_idx: int, token_type: SMILESTokenTypes
		start_idx: int, end_idx: int, token_type: SMILESTokenTypes,
		token: str
		):
		@@ -47,2 +49,3 @@ self.bond_idx = bond_idx
		self.token_type = token_type
		self.token = token

		@@ -55,3 +58,6 @@ def extract_bond_char(self, smiles):

		def __str__(self):
		return self.token


		def tokenize_smiles(smiles: str) -> Iterator[SMILESToken]:
		@@ -68,3 +74,3 @@ """Splits a SMILES string into its tokens.
		if smiles[i] == ".":
		yield SMILESToken(None, i, i + 1, SMILESTokenTypes.DOT)
		yield SMILESToken(None, i, i + 1, SMILESTokenTypes.DOT, smiles[i])
		i += 1
		@@ -84,5 +90,7 @@ continue
		if smiles[i: i + 2] in ("Br", "Cl"): # two-letter elements
		token = SMILESToken(bond_idx, i, i + 2, SMILESTokenTypes.ATOM)
		token = SMILESToken(bond_idx, i, i + 2,
		SMILESTokenTypes.ATOM, smiles[i: i + 2])
		else: # one-letter elements (e.g. C, N, ...)
		token = SMILESToken(bond_idx, i, i + 1, SMILESTokenTypes.ATOM)
		token = SMILESToken(bond_idx, i, i + 1,
		SMILESTokenTypes.ATOM, smiles[i:i + 1])

		@@ -93,3 +101,4 @@ elif smiles[i] == "[": # atoms encased in brackets (e.g. [NH])
		raise SMILESParserError(smiles, "hanging bracket [", i)
		token = SMILESToken(bond_idx, i, r_idx + 1, SMILESTokenTypes.ATOM)
		token = SMILESToken(bond_idx, i, r_idx + 1,
		SMILESTokenTypes.ATOM, smiles[i:r_idx + 1])

		@@ -99,6 +108,8 @@ elif smiles[i] in ("(", ")"): # open and closed branch brackets
		raise SMILESParserError(smiles, "hanging_bond", bond_idx)
		token = SMILESToken(None, i, i + 1, SMILESTokenTypes.BRANCH)
		token = SMILESToken(
		None, i, i + 1, SMILESTokenTypes.BRANCH, smiles[i:i+1])

		elif smiles[i].isdigit(): # one-digit ring number
		token = SMILESToken(bond_idx, i, i + 1, SMILESTokenTypes.RING)
		token = SMILESToken(bond_idx, i, i + 1,
		SMILESTokenTypes.RING, smiles[i:i+1])

		@@ -110,3 +121,4 @@ elif smiles[i] == "%": # two-digit ring number (e.g. %12)
		raise SMILESParserError(smiles, err_msg, i)
		token = SMILESToken(bond_idx, i, i + 3, SMILESTokenTypes.RING)
		token = SMILESToken(bond_idx, i, i + 3,
		SMILESTokenTypes.RING, smiles[i:i+3])

		@@ -197,6 +209,7 @@ else:

		def smiles_to_mol(smiles: str) -> MolecularGraph:
		def smiles_to_mol(smiles: str, attributable: bool) -> MolecularGraph:
		"""Reads a molecular graph from a SMILES string.

		:param smiles: the input SMILES string.
		:param attributable: if molecular graph needs to include attributions
		:return: a molecular graph that the input SMILES string represents.
		@@ -209,10 +222,11 @@ :raises SMILESParserError: if the input SMILES is invalid.

		mol = MolecularGraph()
		mol = MolecularGraph(attributable=attributable)
		tokens = deque(tokenize_smiles(smiles))
		i = 0
		while tokens:
		_derive_mol_from_tokens(mol, smiles, tokens)
		i = _derive_mol_from_tokens(mol, smiles, tokens, i)
		return mol


		def _derive_mol_from_tokens(mol, smiles, tokens):
		def _derive_mol_from_tokens(mol, smiles, tokens, i):
		tok = None
		@@ -240,3 +254,3 @@ prev_stack = deque() # keep track of previous atom on the current chain

		curr = _attach_atom(mol, bond_char, curr, prev_atom)
		curr, i = _attach_atom(mol, bond_char, curr, prev_atom, i, tok)
		prev_stack.pop()
		@@ -277,2 +291,3 @@ prev_stack.append(curr)
		raise Exception("invalid symbol type")
		i += 1

		@@ -291,8 +306,11 @@ if len(mol) == 0:
		raise SMILESParserError(smiles, err_msg, tok.start_idx)
		return i


		def _attach_atom(mol, bond_char, atom, prev_atom):
		def _attach_atom(mol, bond_char, atom, prev_atom, i, tok):
		is_root = (prev_atom is None)
		mol.add_atom(atom, mark_root=is_root)

		if bond_char:
		i += 1
		o = mol.add_atom(atom, mark_root=is_root)
		mol.add_attribution(o, [Attribution(i, str(tok))])
		if not is_root:
		@@ -303,4 +321,5 @@ src, dst = prev_atom.index, atom.index
		order = 1.5 # handle implicit aromatic bonds, e.g. cc
		mol.add_bond(src=src, dst=dst, order=order, stereo=stereo)
		return atom
		o = mol.add_bond(src=src, dst=dst, order=order, stereo=stereo)
		mol.add_attribution(o, [Attribution(i, str(tok))])
		return atom, i

		@@ -399,3 +418,6 @@

		def mol_to_smiles(mol: MolecularGraph) -> str:
		def mol_to_smiles(
		mol: MolecularGraph,
		attribute: bool = False
		) -> Union[str, Tuple[str, List[Tuple[str, List[Tuple[int, str]]]]]]:
		"""Converts a molecular graph into its SMILES representation, maintaining
		@@ -405,3 +427,6 @@ the traversal order indicated by the input graph.
		:param mol: the input molecule.
		:return: a SMILES string representing the input molecule.
		:param attribute: if an attribution should be returned
		:return: a SMILES string representing the input molecule if
		attribute is ``False``, otherwise a tuple is returned of
		SMILES string and attribution list.
		"""
		@@ -411,13 +436,29 @@ assert mol.is_kekulized()
		fragments = []
		attribution_maps = []
		attribution_index = 0
		ring_log = dict()
		for root in mol.get_roots():
		derived = []
		_derive_smiles_from_fragment(derived, mol, root, ring_log)
		_derive_smiles_from_fragment(
		derived, mol, root, ring_log, attribution_maps, attribution_index)
		attribution_index += len(derived)
		fragments.append("".join(derived))
		return ".".join(fragments)
		# trim attribution map of empty tokens
		attribution_maps = [a for a in attribution_maps if a.token]
		result = ".".join(fragments), attribution_maps
		return result if attribute else result[0]


		def _derive_smiles_from_fragment(derived, mol, root, ring_log):
		def _derive_smiles_from_fragment(
		derived,
		mol,
		root,
		ring_log,
		attribution_maps, attribution_index=0):
		curr_atom, curr = mol.get_atom(root), root
		derived.append(atom_to_smiles(curr_atom))
		token = atom_to_smiles(curr_atom)
		derived.append(token)
		attribution_maps.append(AttributionMap(
		len(derived) - 1 + attribution_index,
		token, mol.get_attribution(curr_atom)))

		@@ -427,3 +468,7 @@ out_bonds = mol.get_out_dirbonds(curr)
		if bond.ring_bond:
		derived.append(bond_to_smiles(bond))
		token = bond_to_smiles(bond)
		derived.append(token)
		attribution_maps.append(AttributionMap(
		len(derived) - 1 + attribution_index,
		token, mol.get_attribution(bond)))
		ends = (min(bond.src, bond.dst), max(bond.src, bond.dst))
		@@ -439,6 +484,12 @@ rnum = ring_log.setdefault(ends, len(ring_log) + 1)

		derived.append(bond_to_smiles(bond))
		_derive_smiles_from_fragment(derived, mol, bond.dst, ring_log)

		token = bond_to_smiles(bond)
		derived.append(token)
		attribution_maps.append(AttributionMap(
		len(derived) - 1 + attribution_index,
		token, mol.get_attribution(bond)))
		_derive_smiles_from_fragment(
		derived, mol, bond.dst, ring_log,
		attribution_maps, attribution_index)
		if i < len(out_bonds) - 1:
		derived.append(")")
		return attribution_maps

+4

-4

setup.py

		@@ -10,3 +10,3 @@ #!/usr/bin/env python
		name="selfies",
		version="2.0.0",
		version="2.1.0",
		author="Mario Krenn, Alston Lo, and many other contributors",
		@@ -23,6 +23,6 @@ author_email="mario.krenn@utoronto.ca, alan@aspuru.com",
		"Programming Language :: Python :: 3",
		"Programming Language :: Python :: 3.5",
		"Programming Language :: Python :: 3.6",
		"Programming Language :: Python :: 3.7",
		"Programming Language :: Python :: 3.8",
		"Programming Language :: Python :: 3.9",
		"Programming Language :: Python :: 3.10",
		"Programming Language :: Python :: 3 :: Only",
		@@ -32,3 +32,3 @@ "License :: OSI Approved :: Apache Software License",
		],
		python_requires='>=3.5'
		python_requires='>=3.7'
		)

-63

selfies/utils/linked_list.py

		from typing import Any


		class SinglyLinkedList:
		"""A simple singly linked list that supports O(1) append and O(1) extend.
		"""

		def __init__(self):
		self._head = None
		self._tail = None
		self._count = 0

		def __len__(self):
		return self._count

		def __iter__(self):
		return SinglyLinkedListIterator(self)

		@property
		def head(self):
		return self._head

		def append(self, item: Any) -> None:
		node = [item, None]

		if self._head is None:
		self._head = node
		self._tail = node
		else:
		self._tail[1] = node
		self._tail = node
		self._count += 1

		def extend(self, other) -> None:
		assert isinstance(other, SinglyLinkedList)

		if other._head is None:
		return

		if self._head is None:
		self._head = other._head
		self._tail = other._tail
		else:
		self._tail[1] = other._head
		self._tail = other._tail
		self._count += len(other)


		class SinglyLinkedListIterator:

		def __init__(self, linked_list):
		self._curr = linked_list.head

		def __iter__(self):
		return self

		def __next__(self):
		if self._curr is None:
		raise StopIteration
		else:
		item = self._curr[0]
		self._curr = self._curr[1]
		return item

selfies - npm Package Compare versions

Improved metrics

Worsened metrics