🐍🔍 PyJess
Cython bindings and Python interface to Jess, a 3D template matching software.
🗺️ Overview
Jess is an algorithm for constraint-based structural template matching
proposed by Jonathan Barker et al.[1]. It can be used to identify
catalytic residues from a known template inside a protein structure. Jess
is an evolution of TESS, a geometric hashing algorithm developed by
Andrew Wallace et al.[2], removing some pre-computation and
structural requirements from the original algorithm. Jess was further
updated and maintained by Ioannis Riziotis
during his PhD in the Thornton group.
PyJess is a Python module that provides bindings to Jess using
Cython. It allows creating templates, querying them
with protein structures, and retrieving the hits using a Python API without
performing any external I/O.
🔧 Installing
PyJess is available for all modern Python versions (3.7+).
It can be installed directly from PyPI,
which hosts some pre-built x86-64 wheels for Linux, MacOS, and Windows,
as well as the code required to compile from source with Cython:
$ pip install pyjess
Check the install page
of the documentation for other ways to install PyJess on your machine.
💡 Example
Load templates to be used as references from different template files:
import glob
import pyjess
templates = []
for path in sorted(glob.iglob("vendor/jess/examples/template_*.qry")):
templates.append(Template.load(path, id=os.path.basename(path)))
Create a Jess
instance and use it to query a molecule (a PDB structure)
against the stored templates:
jess = Jess(templates)
mol = Molecule("vendor/jess/examples/test_pdbs/pdb1a0p.ent")
query = jess.query(mol, rmsd_threshold=2.0, distance_cutoff=3.0, max_dynamic_distance=3.0)
The hits are computed iteratively, and the different output statistics are
computed on-the-fly when requested:
for hit in query:
print(hit.molecule.id, hit.template.id, hit.rmsd, hit.log_evalue)
for atom in hit.atoms():
print(atom.name, atom.x, atom.y, atom.z)
🧶 Thread-safety
Once a Jess
instance has been created, the templates cannot be edited anymore,
making the Jess.query
method re-entrant. This allows querying several
molecules against the same templates in parallel using a thread pool:
molecules = []
for path in glob.glob("vendor/jess/examples/test_pdbs/*.ent"):
molecules.append(Molecule.load(path))
with multiprocessing.ThreadPool() as pool:
hits = pool.map(jess.query, molecules)
⚠️ Prior to PyJess v0.2.1
, the Jess code was running some thread-unsafe operations which have now been patched.
If running Jess in parallel, make sure to use v0.2.1
or later to use the code patched with re-entrant functions.
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker
if you need to report or ask something. If you are filing in on a bug,
please include as much information as you can about the issue, and try to
recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
📋 Changelog
This project adheres to Semantic Versioning
and provides a changelog
in the Keep a Changelog format.
⚖️ License
This library is provided under the MIT License. The JESS code is distributed under the MIT License as well.
This project is in no way not affiliated, sponsored, or otherwise endorsed
by the JESS authors. It was developed
by Martin Larralde during his PhD project
at the European Molecular Biology Laboratory in
the Zeller team.
📚 References
- [1] Barker, J. A., & Thornton, J. M. (2003). An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics (Oxford, England), 19(13), 1644–1649. doi:10.1093/bioinformatics/btg226.
- [2] Wallace, A. C., Borkakoti, N., & Thornton, J. M. (1997). TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites. Protein science : a publication of the Protein Society, 6(11), 2308–2323. doi:10.1002/pro.5560061104.