Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes.
Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
Pyrodigal is a Python module that provides bindings to Prodigal using Cython. It directly interacts with the Prodigal internals, which has the following advantages:
v2.6.3+31b300a
. This was verified
extensively by Julian Hahnfeld and can be
checked with his comparison repository.The library now features everything from the original Prodigal CLI:
prodigal -p
).prodigal -m
).prodigal -c
).prodigal -g
), and the Shine-Dalgarno motif
search can be forcefully bypassed (prodigal -n
)prodigal -a
), the gene sequences in FASTA format
(prodigal -d
), or the potential gene scores in tabular format
(prodigal -s
).prodigal -t
).In addition, the new features are available:
v3.0.0
, you can use your own
metagenomic models to run Pyrodigal in meta-mode. Check for instance
pyrodigal-gv
, which
provides additional models for giant viruses and gut phages.Pyrodigal makes several changes compared to the original Prodigal binary regarding memory management:
pyrodigal.GeneFinder
instances are thread-safe. In addition, the
find_genes
method is re-entrant. This means you can train an
GeneFinder
instance once, and then use a pool to process sequences in parallel:
import multiprocessing.pool
import pyrodigal
gene_finder = pyrodigal.GeneFinder()
gene_finder.train(training_sequence)
with multiprocessing.pool.ThreadPool() as pool:
predictions = pool.map(orf_finder.find_genes, sequences)
This project is supported on Python 3.7 and later.
Pyrodigal can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/MacOS/Windows) and the Aarch64 architecture (Linux/MacOS), as well as the code required to compile from source with Cython:
$ pip install pyrodigal
Otherwise, Pyrodigal is also available as a Bioconda package:
$ conda install -c bioconda pyrodigal
Check the install page of the documentation for other ways to install Pyrodigal on your machine.
Let's load a sequence from a
GenBank file, use an GeneFinder
to find all the genes it contains, and print the proteins in two-line FASTA
format.
To use the GeneFinder
in single mode (corresponding to prodigal -p single
, the default operation mode of Prodigal),
you must explicitly call the
train
method
with the sequence you want to use for training before trying to find genes,
or you will get a RuntimeError
:
import Bio.SeqIO
import pyrodigal
record = Bio.SeqIO.read("sequence.gbk", "genbank")
orf_finder = pyrodigal.GeneFinder()
orf_finder.train(bytes(record.seq))
genes = orf_finder.find_genes(bytes(record.seq))
However, in meta
mode (corresponding to prodigal -p meta
), you can find genes directly:
import Bio.SeqIO
import pyrodigal
record = Bio.SeqIO.read("sequence.gbk", "genbank")
orf_finder = pyrodigal.GeneFinder(meta=True)
for i, pred in enumerate(orf_finder.find_genes(bytes(record.seq))):
print(f">{record.id}_{i+1}")
print(pred.translate())
On older versions of Biopython (before 1.79) you will need to use
record.seq.encode()
instead of bytes(record.seq)
.
import skbio.io
import pyrodigal
seq = next(skbio.io.read("sequence.gbk", "genbank"))
orf_finder = pyrodigal.GeneFinder(meta=True)
for i, pred in enumerate(orf_finder.find_genes(seq.values.view('B'))):
print(f">{record.id}_{i+1}")
print(pred.translate())
We need to use the view
method to get the sequence viewable by Cython as an array of unsigned char
.
Pyrodigal is scientific software, with a published paper in the Journal of Open-Source Software. Please cite both Pyrodigal and Prodigal if you are using it in an academic work, for instance as:
Pyrodigal (Larralde, 2022), a Python library binding to Prodigal (Hyatt et al., 2010).
Detailed references are available on the Publications page of the online documentation.
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
This library is provided under the GNU General Public License v3.0.
The Prodigal code was written by Doug Hyatt and is distributed under the
terms of the GPLv3 as well. See vendor/Prodigal/LICENSE
for more information.
This project is in no way not affiliated, sponsored, or otherwise endorsed by the original Prodigal authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
FAQs
Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes.
We found that pyrodigal demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.