Latest Threat Research:SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains.Details
Socket
Book a DemoInstallSign in
Socket

selfies

Package Overview
Dependencies
Maintainers
1
Versions
16
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

selfies - npm Package Compare versions

Comparing version
2.1.2
to
2.2.0
+8
-9
PKG-INFO
Metadata-Version: 2.1
Name: selfies
Version: 2.1.2
Version: 2.2.0
Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs.
Home-page: https://github.com/aspuru-guzik-group/selfies
Author: Mario Krenn, Alston Lo, and many other contributors
Author-email: mario.krenn@utoronto.ca, alan@aspuru.com
Author: Mario Krenn, Alston Lo, Robert Pollice and many other contributors
Author-email: mario.krenn@mpl.mpg.de, alan@aspuru.com
Classifier: Programming Language :: Python :: 3

@@ -81,3 +81,3 @@ Classifier: Programming Language :: Python :: 3.7

Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
Please refer to the [documentation in our code-paper](https://pubs.rsc.org/en/content/articlelanding/2023/DD/D3DD00044C),
which contains a thorough tutorial for getting started with ``selfies``

@@ -244,8 +244,7 @@ and detailed descriptions of the functions

* 50K molecules from a dataset of [non-fullerene acceptors for organic solar cells](https://www.sciencedirect.com/science/article/pii/S2542435117301307)
* 160K+ molecules from various [MoleculeNet](http://moleculenet.ai/datasets-1) datasets
* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.
* 160K+ molecules from various [MoleculeNet](https://moleculenet.org/datasets-1) datasets
In first releases, we also tested the 36M+ molecules from the [eMolecules Database](https://downloads.emolecules.com/free/2024-12-01/).
## Version History

@@ -252,0 +251,0 @@ See [CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md).

@@ -62,3 +62,3 @@ # SELFIES

Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
Please refer to the [documentation in our code-paper](https://pubs.rsc.org/en/content/articlelanding/2023/DD/D3DD00044C),
which contains a thorough tutorial for getting started with ``selfies``

@@ -225,8 +225,7 @@ and detailed descriptions of the functions

* 50K molecules from a dataset of [non-fullerene acceptors for organic solar cells](https://www.sciencedirect.com/science/article/pii/S2542435117301307)
* 160K+ molecules from various [MoleculeNet](http://moleculenet.ai/datasets-1) datasets
* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.
* 160K+ molecules from various [MoleculeNet](https://moleculenet.org/datasets-1) datasets
In first releases, we also tested the 36M+ molecules from the [eMolecules Database](https://downloads.emolecules.com/free/2024-12-01/).
## Version History

@@ -233,0 +232,0 @@ See [CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md).

Metadata-Version: 2.1
Name: selfies
Version: 2.1.2
Version: 2.2.0
Summary: SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based, robust representation of semantically constrained graphs.
Home-page: https://github.com/aspuru-guzik-group/selfies
Author: Mario Krenn, Alston Lo, and many other contributors
Author-email: mario.krenn@utoronto.ca, alan@aspuru.com
Author: Mario Krenn, Alston Lo, Robert Pollice and many other contributors
Author-email: mario.krenn@mpl.mpg.de, alan@aspuru.com
Classifier: Programming Language :: Python :: 3

@@ -81,3 +81,3 @@ Classifier: Programming Language :: Python :: 3.7

Please refer to the [documentation](https://selfiesv2.readthedocs.io/en/latest/),
Please refer to the [documentation in our code-paper](https://pubs.rsc.org/en/content/articlelanding/2023/DD/D3DD00044C),
which contains a thorough tutorial for getting started with ``selfies``

@@ -244,8 +244,7 @@ and detailed descriptions of the functions

* 50K molecules from a dataset of [non-fullerene acceptors for organic solar cells](https://www.sciencedirect.com/science/article/pii/S2542435117301307)
* 160K+ molecules from various [MoleculeNet](http://moleculenet.ai/datasets-1) datasets
* 36M+ molecules from the [eMolecules Database](https://www.emolecules.com/info/products-data-downloads.html).
Due to its large size, this dataset is not included on the repository. To run tests
on it, please download the dataset into the ``tests/test_sets`` directory
and run the ``tests/run_on_large_dataset.py`` script.
* 160K+ molecules from various [MoleculeNet](https://moleculenet.org/datasets-1) datasets
In first releases, we also tested the 36M+ molecules from the [eMolecules Database](https://downloads.emolecules.com/free/2024-12-01/).
## Version History

@@ -252,0 +251,0 @@ See [CHANGELOG](https://github.com/aspuru-guzik-group/selfies/blob/master/CHANGELOG.md).

@@ -12,5 +12,5 @@ import functools

"N": 3, "N+1": 4, "N-1": 2,
"C": 4, "C+1": 5, "C-1": 3,
"P": 5, "P+1": 6, "P-1": 4,
"S": 6, "S+1": 7, "S-1": 5,
"C": 4, "C+1": 3, "C-1": 3,
"P": 5, "P+1": 4, "P-1": 6,
"S": 6, "S+1": 5, "S-1": 5,
"?": 8

@@ -52,3 +52,3 @@ }

+-----------------+-----------+---+---+-----+-----+---+-----+-----+
| ``default`` | 1 | 3 | 5 | 6 | 4 | 6 | 7 | 5 |
| ``default`` | 1 | 3 | 5 | 4 | 6 | 6 | 5 | 5 |
+-----------------+-----------+---+---+-----+-----+---+-----+-----+

@@ -55,0 +55,0 @@ | ``octet_rule`` | 1 | 3 | 3 | 4 | 2 | 2 | 3 | 1 |

@@ -24,2 +24,9 @@ ELEMENTS = {

VALENCE_ELECTRONS = {
"B": 3, "Al": 3,
"C": 4, "Si": 4,
"N": 5, "P": 5, "As": 5,
"O": 6, "S": 6, "Se": 6, "Te": 6
}
AROMATIC_SUBSET = set(e.lower() for e in AROMATIC_VALENCES)

@@ -26,0 +33,0 @@

@@ -7,3 +7,3 @@ import functools

from selfies.bond_constraints import get_bonding_capacity
from selfies.constants import AROMATIC_VALENCES
from selfies.constants import AROMATIC_VALENCES, VALENCE_ELECTRONS
from selfies.utils.matching_utils import find_perfect_matching

@@ -258,3 +258,3 @@

kept_nodes = set(itertools.filterfalse(self._prune_from_ds, ds))
# relabel kept DS nodes to be 0, 1, 2, ...

@@ -270,3 +270,3 @@ label_to_node = list(sorted(kept_nodes))

pruned_ds[label].append(node_to_label[adj])
matching = find_perfect_matching(pruned_ds)

@@ -294,9 +294,9 @@ if matching is None:

return True # aromatic atom with no aromatic bonds
atom = self._atoms[node]
valences = AROMATIC_VALENCES[atom.element]
# each bond in DS has order 1.5 - we treat them as single bonds
used_electrons = int(self._bond_counts[node] - 0.5 * len(adj_nodes))
if atom.h_count is None: # account for implicit Hs

@@ -308,3 +308,18 @@ assert atom.charge == 0

used_electrons += atom.h_count
free_electrons = valence - used_electrons
return not ((free_electrons >= 0) and (free_electrons % 2 != 0))
# count the total number of bound electrons of each atom
bound_electrons = (max(0, atom.charge) + atom.h_count
+ int(self._bond_counts[node])
+ int(2 * (self._bond_counts[node] % 1)))
# calculate the number of unpaired electrons of each atom
radical_electrons = (max(0, VALENCE_ELECTRONS[atom.element]
- bound_electrons) % 2)
# unpaired electrons do not contribute to the aromatic system
free_electrons = valence - used_electrons - radical_electrons
if any(used_electrons == v - atom.charge for v in valences):
return True
else:
return not ((free_electrons >= 0) and (free_electrons % 2 != 0))

@@ -10,5 +10,5 @@ #!/usr/bin/env python

name="selfies",
version="2.1.2",
author="Mario Krenn, Alston Lo, and many other contributors",
author_email="mario.krenn@utoronto.ca, alan@aspuru.com",
version="2.2.0",
author="Mario Krenn, Alston Lo, Robert Pollice and many other contributors",
author_email="mario.krenn@mpl.mpg.de, alan@aspuru.com",
description="SELFIES (SELF-referencIng Embedded Strings) is a "

@@ -15,0 +15,0 @@ "general-purpose, sequence-based, robust representation of "

@@ -11,2 +11,8 @@ import pytest

def roundtrip_eq(smiles_in, smiles_out):
sel = sf.encoder(smiles_in)
smi = sf.decoder(sel)
return smi == smiles_out
def test_branch_and_ring_at_state_X0():

@@ -334,2 +340,3 @@ """Tests SELFIES with branches and rings at state X0 (i.e. at the

def test_large_selfies_decoding():

@@ -342,2 +349,23 @@ """Test that we can decode extremely large SELFIES strings (used to cause a RecursionError)

assert decode_eq(large_selfies, expected_smiles)
assert decode_eq(large_selfies, expected_smiles)
def test_radical_kekulization():
"""Tests kekulization of aromatic systems with radicals and charges.
"""
assert roundtrip_eq("c1ccc[c]c1", "C1=CC=C[CH0]=C1")
assert roundtrip_eq("c1[c]n1(C)", "C1=[CH0]N1C")
assert roundtrip_eq("c1[C][n+]1(C)", "C=1[CH0][N+1]=1C")
assert roundtrip_eq("c1nnn[n-]1", "C1=NN=N[N-1]1")
assert roundtrip_eq("c1ccn[c-](C)[n+]1=O", "C1=CC=N[C-1](C)[N+1]1=O")
assert roundtrip_eq("c1ccs[n+]1c2ccccc2", "C=1C=CS[N+1]=1C2=CC=CC=C2")
assert roundtrip_eq("c1ccs[nH+]1", "C=1C=CS[NH1+1]=1")
def test_novel_charged_symbols():
"""Test decoding of updated constraints for charged atoms (update in 2.2.0)."""
assert decode_eq("[N][#C+1][#NH1][#C@H1]", "N#[C+1]")
assert decode_eq("[O+1][=P+1][#P-1][#C@@]", "[O+1]=[P+1]=[P-1]#[C@@]")
assert decode_eq("[=C-1][#S+1][#B]", "[C-1]#[S+1]=B")