Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
This python library cflib
provides scripts to convert between fasta, VCF and
counts files. Counts files are used by
PoMo, an
implementation of a polymorphism-aware phylogenetic model. We advice you to use
PoMo implemented in IQ-TREE.
For a reference, please see and cite:
Schrempf, D., Minh, B. Q., De Maio, N., von Haeseler, A., &
Kosiol, C. (2016). Reversible Polmorphism-Aware Phylotenetic
Models and their Application to Tree Inference. Journal of
Theoretical Biology, in press.
cflib
requires python
(Version 3.x) to be
installed. cflib
also uses the following python libraries that will be
automatically pulled when installing cflib
:
Install cflib
and the conversion scripts with
pip install --user cflib-pomo
Note that the name of cflib
on the PyPI repository (which is used by pip
) is
cflib-pomo
, since the name cflib
was taken!
If the standard Python version of your operation system is still 2.x (e.g.,
OSX), make sure that you use, pip3
.
The --user
flag is optional and tells Python to install cflib
and
the scripts only for this user but not system wide.
If you want to uninstall cflib
,
pip uninstall cflib-pomo
The [conversion scripts](#Conversion scripts) should be directly available if
your PATH
environment variable is setup correctly. For my Linux installation,
the Python path ~/.local/bin
had to be included. This may vary for your
operating system.
Sample data can be found in examples. Assuming that have installed
cflib
we will now convert example.fasta
to a
counts file named example_from_fasta.cf
. The script
that we will use is called FastaToCounts.py
.
First, we have a look at the help message:
FastaToCounts.py --help
usage: FastaToCounts.py [-h] [-v] [--iupac] fastaFile output
Convert fasta to counts format.
The (aligned) sequences in the fasta file are read in and the data is
written to a counts format file.
Sequence names are stripped at the first dash. If the stripped
sequence name coincide, individuals are put into the same population.
E.g., homo_sapiens-XXX and homo_sapiens-YYY will be in the same
population homo_sapiens.
Take care with large files, this uses a lot of memory.
The input as well as the output files can additionally be gzipped
(indicated by a .gz file ending).
If heterozygotes are encoded with IUPAC codes (e.g., 'r' for A or G),
homozygotes need to be counted twice so that the level of polymorphism
stays correct. This can be done with the `--iupac` flag.
positional arguments:
fastaFile path to (gzipped) fasta file
output name of (gzipped) outputfile in counts format
optional arguments:
-h, --help show this help message and exit
-v, --verbose turn on verbosity (-v or -vv)
--iupac heteorzygotes are encoded with IUPAC codes
As requested, the sequence names in example.fasta
are, e.g.,
Sheep-1
, Sheep-2
, and so on. The following code converts the file
example.fasta
into the counts file example_from_fasta.cf
:
FastaToCounts.py example.fasta example_from_fasta.cf
IUPAC codes are supported and handled adequately.
In particular,
N
can be used to denote any base or that the base is unknown; the letter *
can also be used in this case, although it is non-standard;
-
or .
denote a gap or a deletion.
Also the other IUPAC codes are supported.
libPoMo
).Each script comes with its own documentation. Please execute, e.g.,
FastaToCounts.py --help
All conversion scripts can be found in the scripts folder.
If you are interested in cflib
itself, please refer to the
cflib reference manual.
FAQs
Counts file library and conversion scripts.
We found that cflib-pomo demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.