Phylogeny Embedding &
Approximate Representation
Goldman Group - European Bioinformatics Institute
PEAR can:
- Compute the distance matrix given a set of phylogenetic trees;
- Embed and represent the distance matrix in 2D or 3D.
See also the autogenerated documentation and PyPI .
PEAR usage
Pear is both a python software and library. It can be installed with python -m pip install pear_ebi
or downloaded from Github. Pear is currently compatible only with Linux.
PEAR as a python library
Once installed, Pear can be used to upload newick trees in python and represent them in embedded spaces. We recommend to use it on either jupyter notebook or lab, as these tools allow for more interaction with the graphs. On these platforms, the user is allowed to interact with widgets that allows to modify several parameteres of the plots. For specific uses and applications, see the examples.
PEAR as a program
Run pear_ebi --help
to see the complete list of arguments and flags.
Simple usage
pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF
this script calculates the unweighted Robison Foulds distances between the trees in the file "beast_run1.trees", which contains 1001 phylogenetic trees.
the flag -m indicates the method used to compute the dissimilarity between phylogeneic trees. In this case, HasRF has been used.
To embed these distances in a lower-dimensional space, we can use PCoA (MDS) or tSNE:
pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF --pcoa 2
we therefore embedded the distance matrix in 2 dimensions. Using the flag -quality one can assess the correlation between the distances in the N-dimensional space and in the embedding.
pear_ebi examples_tree_sets/beast_trees/beast_run1.trees -m hashrf_RF --pcoa 2 --plot
The flag -plot indicates that PEAR has to plot the embeddings and show them, respectively. If an embedding method is specified the plots are produced anyway. Plotting doesn't require any indication on the number of dimensions as the embeddings are represented in 2 dimensions if the distances are embedded in 2 dimensions, while it plots on 2 and 3 dimensions in any other case.
One can specify any number of files containing trees. Moreover, it is possible to specify a single directory using -dir, and possibly a pattern using -pattern, in order to select multiple files.
Tree Set
It's possible to compute the distance matrix and re-use it in subsequent runs of PEAR by specifying the distance matrix file with the flag -d. Additionally, it's possible to define the name of the output file (-o).
If any additional metadata is available, this may be specified by indicating a .csv file containing a dataframe of compatible shape.
Config file
A standard config toml file can be used for specific emebddings of multiple sets of trees. Instances of toml files are reported in the examples folder.
Using the config file allows one to use all the features of PEAR, including additional embedding methods and plot designs. The config file can also be used to specify lists of indexes of interesting trees in the sets, in order to highlight them in the final plots.
Interactive mode
pear_ebi -i
:
this script launches the program in the interactive mode. Once the program starts, it is going to guide you through its usage thanks to an intuitive interface.
Turorials and Examples
Follow this link for a complete set of basic and avanced guides and tutorials to use PEAR on the command line and as a python library.
Licensing
This project is released under the terms of the MIT Open Source License. View
LICENSE.txt for more information.