selfies
Advanced tools
+8
-9
| Metadata-Version: 2.1 | ||
| Name: selfies | ||
| Version: 2.1.2 | ||
| Version: 2.2.0 | ||
| Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs. | ||
| Home-page: https://github.com/aspuru-guzik-group/selfies | ||
| Author: Mario Krenn, Alston Lo, and many other contributors | ||
| Author-email: mario.krenn@utoronto.ca, alan@aspuru.com | ||
| Author: Mario Krenn, Alston Lo, Robert Pollice and many other contributors | ||
| Author-email: mario.krenn@mpl.mpg.de, alan@aspuru.com | ||
| Classifier: Programming Language :: Python :: 3 | ||
@@ -81,3 +81,3 @@ Classifier: Programming Language :: Python :: 3.7 | ||
| Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/), | ||
| Please refer to the [documentation in our code-paper](https://pubs.rsc.org/en/content/articlelanding/2023/DD/D3DD00044C), | ||
| which contains a thorough tutorial for getting started with ``selfies`` | ||
@@ -244,8 +244,7 @@ and detailed descriptions of the functions | ||
| * 50K molecules from a dataset of [non-fullerene acceptors for organic solar cells](https://www.sciencedirect.com/science/article/pii/S2542435117301307) | ||
| * 160K+ molecules from various [MoleculeNet](http://moleculenet.ai/datasets-1) datasets | ||
| * 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html). | ||
| Due to its large size, this dataset is not included on the repository. To run tests | ||
| on it, please download the dataset into the ``tests/test_sets`` directory | ||
| and run the ``tests/run_on_large_dataset.py`` script. | ||
| * 160K+ molecules from various [MoleculeNet](https://moleculenet.org/datasets-1) datasets | ||
| In first releases, we also tested the 36M+ molecules from the [eMolecules Database](https://downloads.emolecules.com/free/2024-12-01/). | ||
| ## Version History | ||
@@ -252,0 +251,0 @@ See [CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md). |
+5
-6
@@ -62,3 +62,3 @@ # SELFIES | ||
| Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/), | ||
| Please refer to the [documentation in our code-paper](https://pubs.rsc.org/en/content/articlelanding/2023/DD/D3DD00044C), | ||
| which contains a thorough tutorial for getting started with ``selfies`` | ||
@@ -225,8 +225,7 @@ and detailed descriptions of the functions | ||
| * 50K molecules from a dataset of [non-fullerene acceptors for organic solar cells](https://www.sciencedirect.com/science/article/pii/S2542435117301307) | ||
| * 160K+ molecules from various [MoleculeNet](http://moleculenet.ai/datasets-1) datasets | ||
| * 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html). | ||
| Due to its large size, this dataset is not included on the repository. To run tests | ||
| on it, please download the dataset into the ``tests/test_sets`` directory | ||
| and run the ``tests/run_on_large_dataset.py`` script. | ||
| * 160K+ molecules from various [MoleculeNet](https://moleculenet.org/datasets-1) datasets | ||
| In first releases, we also tested the 36M+ molecules from the [eMolecules Database](https://downloads.emolecules.com/free/2024-12-01/). | ||
| ## Version History | ||
@@ -233,0 +232,0 @@ See [CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md). |
| Metadata-Version: 2.1 | ||
| Name: selfies | ||
| Version: 2.1.2 | ||
| Version: 2.2.0 | ||
| Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs. | ||
| Home-page: https://github.com/aspuru-guzik-group/selfies | ||
| Author: Mario Krenn, Alston Lo, and many other contributors | ||
| Author-email: mario.krenn@utoronto.ca, alan@aspuru.com | ||
| Author: Mario Krenn, Alston Lo, Robert Pollice and many other contributors | ||
| Author-email: mario.krenn@mpl.mpg.de, alan@aspuru.com | ||
| Classifier: Programming Language :: Python :: 3 | ||
@@ -81,3 +81,3 @@ Classifier: Programming Language :: Python :: 3.7 | ||
| Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/), | ||
| Please refer to the [documentation in our code-paper](https://pubs.rsc.org/en/content/articlelanding/2023/DD/D3DD00044C), | ||
| which contains a thorough tutorial for getting started with ``selfies`` | ||
@@ -244,8 +244,7 @@ and detailed descriptions of the functions | ||
| * 50K molecules from a dataset of [non-fullerene acceptors for organic solar cells](https://www.sciencedirect.com/science/article/pii/S2542435117301307) | ||
| * 160K+ molecules from various [MoleculeNet](http://moleculenet.ai/datasets-1) datasets | ||
| * 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html). | ||
| Due to its large size, this dataset is not included on the repository. To run tests | ||
| on it, please download the dataset into the ``tests/test_sets`` directory | ||
| and run the ``tests/run_on_large_dataset.py`` script. | ||
| * 160K+ molecules from various [MoleculeNet](https://moleculenet.org/datasets-1) datasets | ||
| In first releases, we also tested the 36M+ molecules from the [eMolecules Database](https://downloads.emolecules.com/free/2024-12-01/). | ||
| ## Version History | ||
@@ -252,0 +251,0 @@ See [CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md). |
@@ -12,5 +12,5 @@ import functools | ||
| "N": 3, "N+1": 4, "N-1": 2, | ||
| "C": 4, "C+1": 5, "C-1": 3, | ||
| "P": 5, "P+1": 6, "P-1": 4, | ||
| "S": 6, "S+1": 7, "S-1": 5, | ||
| "C": 4, "C+1": 3, "C-1": 3, | ||
| "P": 5, "P+1": 4, "P-1": 6, | ||
| "S": 6, "S+1": 5, "S-1": 5, | ||
| "?": 8 | ||
@@ -52,3 +52,3 @@ } | ||
| +-----------------+-----------+---+---+-----+-----+---+-----+-----+ | ||
| | ``default`` | 1 | 3 | 5 | 6 | 4 | 6 | 7 | 5 | | ||
| | ``default`` | 1 | 3 | 5 | 4 | 6 | 6 | 5 | 5 | | ||
| +-----------------+-----------+---+---+-----+-----+---+-----+-----+ | ||
@@ -55,0 +55,0 @@ | ``octet_rule`` | 1 | 3 | 3 | 4 | 2 | 2 | 3 | 1 | |
@@ -24,2 +24,9 @@ ELEMENTS = { | ||
| VALENCE_ELECTRONS = { | ||
| "B": 3, "Al": 3, | ||
| "C": 4, "Si": 4, | ||
| "N": 5, "P": 5, "As": 5, | ||
| "O": 6, "S": 6, "Se": 6, "Te": 6 | ||
| } | ||
| AROMATIC_SUBSET = set(e.lower() for e in AROMATIC_VALENCES) | ||
@@ -26,0 +33,0 @@ |
+23
-8
@@ -7,3 +7,3 @@ import functools | ||
| from selfies.bond_constraints import get_bonding_capacity | ||
| from selfies.constants import AROMATIC_VALENCES | ||
| from selfies.constants import AROMATIC_VALENCES, VALENCE_ELECTRONS | ||
| from selfies.utils.matching_utils import find_perfect_matching | ||
@@ -258,3 +258,3 @@ | ||
| kept_nodes = set(itertools.filterfalse(self._prune_from_ds, ds)) | ||
| # relabel kept DS nodes to be 0, 1, 2, ... | ||
@@ -270,3 +270,3 @@ label_to_node = list(sorted(kept_nodes)) | ||
| pruned_ds[label].append(node_to_label[adj]) | ||
| matching = find_perfect_matching(pruned_ds) | ||
@@ -294,9 +294,9 @@ if matching is None: | ||
| return True # aromatic atom with no aromatic bonds | ||
| atom = self._atoms[node] | ||
| valences = AROMATIC_VALENCES[atom.element] | ||
| # each bond in DS has order 1.5 - we treat them as single bonds | ||
| used_electrons = int(self._bond_counts[node] - 0.5 * len(adj_nodes)) | ||
| if atom.h_count is None: # account for implicit Hs | ||
@@ -308,3 +308,18 @@ assert atom.charge == 0 | ||
| used_electrons += atom.h_count | ||
| free_electrons = valence - used_electrons | ||
| return not ((free_electrons >= 0) and (free_electrons % 2 != 0)) | ||
| # count the total number of bound electrons of each atom | ||
| bound_electrons = (max(0, atom.charge) + atom.h_count | ||
| + int(self._bond_counts[node]) | ||
| + int(2 * (self._bond_counts[node] % 1))) | ||
| # calculate the number of unpaired electrons of each atom | ||
| radical_electrons = (max(0, VALENCE_ELECTRONS[atom.element] | ||
| - bound_electrons) % 2) | ||
| # unpaired electrons do not contribute to the aromatic system | ||
| free_electrons = valence - used_electrons - radical_electrons | ||
| if any(used_electrons == v - atom.charge for v in valences): | ||
| return True | ||
| else: | ||
| return not ((free_electrons >= 0) and (free_electrons % 2 != 0)) |
+3
-3
@@ -10,5 +10,5 @@ #!/usr/bin/env python | ||
| name="selfies", | ||
| version="2.1.2", | ||
| author="Mario Krenn, Alston Lo, and many other contributors", | ||
| author_email="mario.krenn@utoronto.ca, alan@aspuru.com", | ||
| version="2.2.0", | ||
| author="Mario Krenn, Alston Lo, Robert Pollice and many other contributors", | ||
| author_email="mario.krenn@mpl.mpg.de, alan@aspuru.com", | ||
| description="SELFIES (SELF-referencIng Embedded Strings) is a " | ||
@@ -15,0 +15,0 @@ "general-purpose, sequence-based, robust representation of " |
@@ -11,2 +11,8 @@ import pytest | ||
| def roundtrip_eq(smiles_in, smiles_out): | ||
| sel = sf.encoder(smiles_in) | ||
| smi = sf.decoder(sel) | ||
| return smi == smiles_out | ||
| def test_branch_and_ring_at_state_X0(): | ||
@@ -334,2 +340,3 @@ """Tests SELFIES with branches and rings at state X0 (i.e. at the | ||
| def test_large_selfies_decoding(): | ||
@@ -342,2 +349,23 @@ """Test that we can decode extremely large SELFIES strings (used to cause a RecursionError) | ||
| assert decode_eq(large_selfies, expected_smiles) | ||
| assert decode_eq(large_selfies, expected_smiles) | ||
| def test_radical_kekulization(): | ||
| """Tests kekulization of aromatic systems with radicals and charges. | ||
| """ | ||
| assert roundtrip_eq("c1ccc[c]c1", "C1=CC=C[CH0]=C1") | ||
| assert roundtrip_eq("c1[c]n1(C)", "C1=[CH0]N1C") | ||
| assert roundtrip_eq("c1[C][n+]1(C)", "C=1[CH0][N+1]=1C") | ||
| assert roundtrip_eq("c1nnn[n-]1", "C1=NN=N[N-1]1") | ||
| assert roundtrip_eq("c1ccn[c-](C)[n+]1=O", "C1=CC=N[C-1](C)[N+1]1=O") | ||
| assert roundtrip_eq("c1ccs[n+]1c2ccccc2", "C=1C=CS[N+1]=1C2=CC=CC=C2") | ||
| assert roundtrip_eq("c1ccs[nH+]1", "C=1C=CS[NH1+1]=1") | ||
| def test_novel_charged_symbols(): | ||
| """Test decoding of updated constraints for charged atoms (update in 2.2.0).""" | ||
| assert decode_eq("[N][#C+1][#NH1][#C@H1]", "N#[C+1]") | ||
| assert decode_eq("[O+1][=P+1][#P-1][#C@@]", "[O+1]=[P+1]=[P-1]#[C@@]") | ||
| assert decode_eq("[=C-1][#S+1][#B]", "[C-1]#[S+1]=B") | ||
Alert delta unavailable
Currently unable to show alert delta for PyPI packages.
157899
1%2380
1.54%