Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Integrated pipeline for ML phylogenetic inference from ABI trace and FASTA data
AB12PHYLO is an integrated, easy-to-use pipeline for Maximum Likelihood (ML) phylogenetic tree inference from ABI traces and FASTA
data.
At its core, AB12PHYLO runs parallelized instances of RAxML-NG (Kozlov et al. 2019) or IQ-Tree (Nguyen et al. 2015) as well as a BLAST search in a reference database.
It enables visual, effortless sample identification based on phylogenetic position and sequence similarity, as well as population subset selection aided by metrics like Tajima's D for estimations of ongoing evolution, or definition of haplotypes.
There are two versions of AB12PHYLO, both started from a terminal: ab12phylo
as a graphical user interface intended to be more user-friendly and intuitive, and in some details more powerful than ab12phylo-cmd
. This version, on the other hand, is a commandline-only tool for maximum reproducibility and automation of a linear pipeline.
While ab12phylo
comes with its own on-screen help, and a very brief example for ab12phylo-cmd
is provided below, detailed installation and usage instructions can be found in the github wiki. Especially for the commandline ab12phylo-cmd
, also check the in-line help via ab12phylo-cmd -h
.
For more individual support or feature requests, please write an email to ab12phylo@gmail.com.
AB12PHYLO can be installed using conda or pip:
conda install -c lkndl -c conda-forge -c bioconda ab12phylo
or
pip install ab12phylo
:memo: | WINDOWS USERS |
---|
Windows users must use Anaconda, and run ab12phylo-init
before starting the graphical ab12phylo
!
When AB12PHYLO is first run, it will check the system for three important non-python tools: RAxML-NG, IQ-Tree 2 and BLAST+. If they are not installed or outdated, AB12PHYLO can download the latest static binaries from GitHub or the NCBI respectively. Check the wiki for more details, troubleshooting, installing from source or updating the package.
As implied above, start the graphical version via ab12phylo
from the terminal, and invoke the commandline version via ab12phylo-cmd
.
ABI trace files are the main input for AB12PHYLO. Additionally, wellsplate tables can be used to translate back to original sample IDs, provided the mapping is identical for all sequenced genes. Reference data may be included in FASTA
format, and the graphical AB12PHYLO accepts FASTA
sequences as the main input format as well.
A:
Sequence data is extracted from ABI trace files using a customisable quality control: Sequence ends are trimmed with a sliding window until a certain number (8 out of 10 by default) of bases reach the minimal accepted phred quality score (between 0 and 60, 30 by default). Bases with low phred quality are replaced by N
only if they form a consecutive stretch that is longer than a certain threshold (5 by default).
B:
Samples missing for a single locus are discarded for all genes. Trimmed traces as well as reference and FASTA
sequences are aligned into single-gene Multiple Sequence Alignments (MSAs), which are then each trimmed to a user-defined level conserved positions using Gblocks 0.91b. For multi-gene analyses, the single-gene MSAs are then concatenated into a multi-gene MSA, which is used for ML tree inference. Trees are re-constructed using either RAxML-NG or IQ-Tree 2, with only the latter one available for Windows.
C:
AB12PHYLO allows editing of the resulting tree and selection of taxa by label matching, shared ancestry or manual picking. For these selected sub-populations, basic population genetics neutrality and diversity metrics are calculated from the conserved MSA positions only, with adjustable tolerance of gaps and unknown characters. The graphical ab12phylo
is both less cumbersome and more capable for these applications; the wiki pages (ab12phylo
, ab12phylo-cmd
) have more details.
A BLAST search for species annotation can be run on a local database, or via the public NCBI BLAST API. However, importing XML results of a web BLAST should be preferred to running remote API calls as a main strategy.
ab12phylo-cmd
exampleA simple real-world invocation of commandline AB12PHYLO might look like this:
ab12phylo-cmd -abi <seq_dir> \
-csv <wellsplates_dir> \
-g <barcode_gene> \
-rf <ref.fasta> \
-bst 1000 \
-dir <results>
where:
<seq_dir>
contains all input ABI trace files, ending in .ab1
<wellsplates_dir>
contains the .csv
mappings of user-defined IDs to sequencer's isolate coordinates<barcode_gene>
was sequenced, see here for more info<ref.fasta>
contains full GenBank reference records like this-bst
= --bootstrap
trees will be generated<results>
is where results will beBiopython, NumPy, pandas, Toytree <= 1.2.0, Toyplot, matplotlib, PyYAML, lxml, xmltramp2, svgutils, Pillow, Requests, Beautiful Soup and Jinja2
The pipeline will use existing installations of the programs listed below if they are found on the system $PATH
and not considered outdated. Otherwise, both ab12phylo
and ab12phylo-cmd
can download the latest static binaries from GitHub or the NCBI on their initial runs or if run with --initialize
.
RAxML-NG version >=1.0.2
BLAST+ version >=2.9
an MSA tool: MAFFT, Clustal Omega, MUSCLE or T-Coffee
(clients for an EMBL service included)
Gblocks 0.91b for MSA trimming (included)
Alexey M. Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, and Alexandros Stamatakis (2019) RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, btz305 doi:10.1093/bioinformatics/btz305
Nguyen,L. T., Schmidt,H. A., Von Haeseler,A., and Minh,B. Q. (2015) IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32, 268–274. doi:10.1093/molbev/msu300
FAQs
Integrated pipeline for ML phylogenetic inference from ABI trace and FASTA data
We found that ab12phylo demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.