
Product
Rust Support Now in Beta
Socket's Rust support is moving to Beta: all users can scan Cargo projects and generate SBOMs, including Cargo.toml-only crates, with Rust-aware supply chain checks.
REvolutionH-tl is a powerful Python tool designed for evolutionary analysis tasks. It provides a comprehensive set of features for orthogroup analysis, gene tree reconstruction, species tree reconstruction, reconciliation of gene and species trees, and visualization of gene content evolution.
[1] Ramírez-Rafael J. A., Korchmaros A., Aviña-Padilla K., López-Sánchez A., Martinez-Medina G., Hernández-Álvarez A., Hellmuth M., Stadler P. F., and Hernandez-Rosales M. (2025) REvolutionH-tl: A Fast and Robust Tool for Decoding Evolutionary Gene Histories. Submitted.
[2] Ramírez-Rafael J. A., Korchmaros A., Aviña-Padilla K., López-Sánchez A., España-Tinajero A. A., Hellmuth M., Stadler P. F., and Hernandez-Rosales M. (2024) REvolutionH-tl: Reconstruction of Evolutionary Histories tool. In Comparative Genomics: 21st International Conference, RECOMB-CG 2024, Boston, MA, USA, April 27–28, 2024, Proceedings. Springer-Verlag, Berlin, Heidelberg, 89–109. https://doi.org/10.1007/978-3-031-58072-7_5
This guide will walk you through the installation, usage, and key functionalities of REvolutionH-tl.
To get started with REvolutionH-tl, you can easily install it using pip:
pip install revolutionhtl
Extra requirements
To perform sequence alignments, you'll also need to install Diamond or BLAST. You can download the Diamond executable from here or use the command line as follows:
wget http://github.com/bbuchfink/diamond/releases/download/v2.1.8/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz
To install BLAST follow these instructions: Unix / Windows. Or use the command line:
apt-get install ncbi-blast+
To generate the figure change in gene content, you'll need to install the following packages in your conda environment.
conda install -c conda-forge r-base=4.4.0
conda install -c conda-forge poppler imagemagick rust ffmpeg libxml2 librsvg tesseract
Next, you'll need to install the required R packages. To do this, follow these steps. Download the package installation script:
wget "https://github.com/gabiiMM/RevolutionH-tl_Plot_Extension/raw/main/R_packages.R" .
Run the R package installation script:
Rscript R_packages.R
REvolutionH-tl is a command-line tool, and it can be used with the following syntax:
python -m revolutionhtl <arguments>
In addition, use the commands for visualization of results:
python -m revolutionhtl.plot_summary <arguments>
python -m revolutionhtl.plot_reconciliation <orthogroup IDs> <arguments>
For examples click here.
Let's delve into the steps of the program, the required arguments, and output files:
The REvolutionH-tl methodology is divided into 6 steps (see figure above). You can run all of them in a row by providing a directory containing fasta files (use the argument -F <directory>
). Furthermore, if you want to run a specific step using precomputed files, you can use the argument -steps <step list>
together with the arguments required for such a step.
When several steps of REvolutionH-tl are executed in a row and the tool has to decide between manually specified files and files produced at a previous step, the tool prioritizes the second option. For example, when running steps 3 and 6, step 3 generates gene trees that will be used as input for step 6 even if the user specifies a different set of gene trees. To avoid this behavior, run step 6 independently.
Alignment Hits Computation. First, REvolutionH-tl runs a sequence aligner to obtain alignment hits and statistics like bit-score and e-value.
Required argument: -F <directory containing fasta files>
,
Output: directory tl_project.alignment_all_vs_all/
.
Best Hit, Orthogroup Selection, and Distance estimation. Best hits are the putative closest evolutionary-related genes across species, and orthogroups are a collection of genes sharing a common ancestor. Gen-to-gene distances are estimated using the scoredist approximation.
Input argument: -alignment_h <directory containing alignment hit files>
(generated at step 1).
Optional: -OG <.tsv file containing orthogroups>
.
Output files: tl_project.best_hits.tsv
, tl_project.orthogroups.tsv
, tl_project.distances.tsv
.
Gene Tree Reconstruction, Best Matches, and Orthology Assignment. Gene trees are reconstructed from the hits, and the inner nodes of the trees are labeled as 'speciation' or 'duplication' events. Best matches are the closest evolutionary-related genes across species based on gene tree topology. Two genes are orthologous if they were conserved after a speciation process.
Required argument: -best_h <.tsv file containing best hits>
(generated at step 2).
Output files: tl_project.gene_trees.tsv
, tl_project.orthologs.tsv
, tl_project.best_matches.tsv
.
Duplication Polytomie Resolution. Duplication-labeled polytomies on the gene trees are resolved using a NJ approach. The resolved nodes correspond to duplication nodes contained in a single branch of the species tree.
Required arguments:
-T <.tsv file containing gene trees>
(generated at step 3).
-D <.tsv file containig gene-to-gene distances>
(generated at step 2),
Output file: tl_project.resolved_trees.tsv
.
Species tree reconstruction. Species trees are obtained as a consensus of the speciation events in the gene trees.
Required argument: -T <.tsv file containing gene trees>
(generated at steps 3 and 4).
Output file: tl_project.species_tree.nhx
.
Tree reconciliation. Tree reconciliation depicts the evolution of genes across existing and ancestral species, it is represented as a gene tree embedded into a species tree.
Required arguments:
-T <.tsv file containing gene trees>
(generated at steps 3 and 4).
-S <single-line file containing species tree>
(generated at step 5).
Output files: tl_project.reconcilied_trees.tsv
, tl_project.corrected_trees.tsv
, tl_project.labeled_species_tree.nhx
.
You can speed up the analysis by using the multithreading arguments -t <nuber of threads>
and -j <number of jobs>
.
To see the complete list of arguments for the main workflow please use python -m revolutionhtl -h
, alternatively, you can check the list below.
Summary plot This command automatically searches in the current directory for the output files of revolutionhtl: tl_project.labeled_species_tree.nhx
, tl_project.reconcilied_trees.tsv
, tl_project.orthogroups
, and tl_project.orthologs
.
You can change the prefix of input files (tl_project.
) using the argument -files_prefix <custom prefix>
, you can also change the input/output path using -files_path <custom path>
.
Furthermore, you can specify input file names using -S <species tree>
, -R <reconcilied trees>
-OG <orthogroups>
, and -OR orthologs
.
Output figures: tl_project.reconciliation_tree.pdf
, tl_project.change_in_gene_content.pdf
Optional arguments: --OGs_mask <list of orthogroups for visualization>
, --size <length in inches>
, and --percentage_upsetplot <percentage (int) of rows shown in upsetplot>
.
Reconciliation plot This command automatically searches in the current directory for the output files of revolutionhtl: tl_project.labeled_species_tree.nhx
, and tl_project.reconcilied_trees.tsv
.
Additionally, you have to provide a list of orthogroup identifiers to be displayed. These IDs have to be present in the column 'OG' of the file tl_project.reconcilied_trees.tsv
.
You can change the prefix of input files (tl_project.
) using the argument -files_prefix <custom prefix>
, you can also change the input/output path using -files_path <custom path>
.
Output figures: tl_project.<orthogroup ID>.pdf
.
For an example with data and practical usage examples, click here. For a detailed description of the theoretical background of this tool see references [1,2].
By following these steps and using REvolutionH-tl, you can conduct comprehensive evolutionary analysis tasks, including orthogroup selection, gene tree reconstruction, species tree reconstruction, and reconciliation, all in one powerful Python tool. Go to the section "Biological Relevance" for applications.
For the example trees in this section, we'll use the example dataset (click here). In particular, we take the orthogroup inferred by REvolutionH-tl with the label 'OG7048'
Forbidden characters
Please avoid the usage of the following characters in the name of the species, genes, or other parameters of the tool: ;
, /
.
NHX format for trees
NHX format is a generalization of the Newick format. For example, the box below contains a species tree with 14 species as leaves. Note that every node of the three is associated with a node ID.
(((HUMAN[&&NHX:node_id=nID4],(MOUSE[&&NHX:node_id=nID6],RAT[&&NHX:node_id=nID7])[&&NHX:node_id=nID5])[&&NHX:node_id=nID3],(CHICK[&&NHX:node_id=nID9],DANRE[&&NHX:node_id=nID10])[&&NHX:node_id=nID8])[&&NHX:node_id=nID2],((ECOLI[&&NHX:node_id=nID13],((DICDI[&&NHX:node_id=nID16],((CAEEL[&&NHX:node_id=nID19],DROME[&&NHX:node_id=nID20])[&&NHX:node_id=nID18],ARATH[&&NHX:node_id=nID21])[&&NHX:node_id=nID17])[&&NHX:node_id=nID15],PLAF7[&&NHX:node_id=nID22])[&&NHX:node_id=nID14])[&&NHX:node_id=nID12],((SCHPO[&&NHX:node_id=nID25],YEAST[&&NHX:node_id=nID26])[&&NHX:node_id=nID24],CANAL[&&NHX:node_id=nID27])[&&NHX:node_id=nID23])[&&NHX:node_id=nID11])[&&NHX:node_id=nID1];
The box below contains a gene tree with 6 genes and two gene losses in the leaves. The label of loss leaves is 'X', and the leaves associated with existing genes have the attribute 'species'. Additionally, the label of inner nodes indicates evolutionary events; 'S' stands for 'speciation' and 'D' for 'duplication'. Finally, like in the species tree, every node of the three is associated with a node ID.
(((Phy001R8JZ[&&NHX:node_id=nID0:species=HUMAN],(Phy001RT94[&&NHX:node_id=nID5:species=MOUSE],Phy003F435[&&NHX:node_id=nID4:species=RAT])S[&&NHX:node_id=nID6])S[&&NHX:node_id=nID7],((Phy003I8P1[&&NHX:node_id=nID1:species=CHICK],Phy003ICTF[&&NHX:node_id=nID2:species=CHICK])D[&&NHX:node_id=nID3],X[&&NHX:node_id=nID9])S[&&NHX:node_id=nID8])S[&&NHX:node_id=nID10],((Phy003I6O9[&&NHX:node_id=nID11:species=CHICK],X[&&NHX:node_id=nID14])S[&&NHX:node_id=nID13],X[&&NHX:node_id=nID16])S[&&NHX:node_id=nID15])D[&&NHX:node_id=nID12];
You can use itol (Click here) for visualization of trees in nhx format. For example, the trees above look like this:
Reconciliation map format
An evolutionary scenario is a gene tree embedded inside a species tree. For this end, REvolutionH-tl constructs a reconciliation map going from nodes of the gene tree to nodes of the species tree.
The format for the reconciliation is the following: given the nodes x
and y
belonging to the gene and species trees correspondingly, we write the map from x
to y
as x:y
. To include in a single string the map of all the nodes of a gene tree, we use a comma (,
) as separator.
The tool uses the attribute node_id
for node identification. The box below provides an example of reconciliation map between the gene and the species trees in the section above.
nID0:nID4,nID1:nID9,nID2:nID9,nID3:nID9,nID4:nID7,nID5:nID6,nID6:nID5,nID7:nID3,nID8:nID8,nID9:nID10,nID10:nID2,nID11:nID9,nID12:nID2,nID13:nID8,nID14:nID10,nID15:nID2,nID16:nID3
You can use the command revolutionhtl.plot_reconciliation
for visualization of the reconciliation, as shown in the figure below. Note that markers at the inner nodes of the gene tree identify evolutionary events: blue diamond for duplication and red bullet for speciation. Similarly, markers at the leaves indicate if the gene exists in a current species (red bullet) or if the gene is extinct (black bullet).
Log file Every time you run REvolutionH-tl, the program writes in the file tl_project.log.txt
the parameters used to run the program, as well as the time and progress.
Fasta directory Fasta files are required for steps 1 and 2. Specify this input using the attribute -F <directory>
. You must have a fasta file for each species in your analysis (At least 2 species). The name of each file has to follow the format <species name>.fa
. Please avoid the usage of forbidden characters in the name of the species.
By default, REvolutionH-tl searches for files with the extension ".fa", but you can change the extension using the attribute -fext <extension>
.
Each gene in your analysis must have a unique identifier within and across species. Duplicated identifiers will raise an error. Remember to avoid forbidden characters in your identifiers.
Alignment hits directory This directory is the output of step 1, it has the suffix .alignment_all_vs_all/
.
The alignment hits directory is required for step 2. Specify it using the attribute -alignment_h <directory>
.
For each pair of species, you must have two files of alignment hits; from species one to species two, and from species two to species one. The name of each file has to follow the format <species_one>.vs.<species_two>.alignment_hits
and has to be consistent with the name of fasta files.
REvolutionH-tl requires the alignment hits in tabular form with no headers. Each row of this table describes an alignment hit throughout seven columns. Below we describe those columns in the same way as Diamond and BLAST:
Orthogroups file This file is an output of step 2, it has the suffix .orthogroups.tsv
.
Orthogroups are stored in a tabular format (.tsv file). Each row in this file represents one orthogroup. The first column assigns a unique identifier for each orthogroup. The number in the second column indicates the number of genes, and the third column shows the number of species represented in the orthogroup. The rest of the columns correspond to the species in your analysis, each of those columns contains the genes present in the orthogroup. If there is more than one gene per species, they are separated using a comma (',').
Best hits file This file is an output of step 2, it has the suffix .best_hits.tsv
.
Best hits are required as input for step 3. You can provide them using the argument -best_h <.tsv file>
.
Each row in this file describes a best hit. This file has to contain at least the following headers:
Distances file This file is an output of step 2, it has the suffix .distances.tsv
.
Distances are required as input for step 4. You can provide them using the argument -D <.tsv file>
.
Each row in this table corresponds to a pair of genes. The header of the table must include:
Gene trees files These files are generated at steps 3, 4, and 6. They have the suffixes .gene_trees.tsv
, .resolved_trees.tsv
, .corrected_trees.tsv
, and .reconcilied_trees.tsv
.
Gene trees are input of steps 4, 5, and 6. You can provide them using the argument -T <.tsv file>
. Note that gene trees are inferred at step 3 and further processed at steps 4 (tree refinement) and 6 (tree reconciliation), we recall whenever REvolutionH-tl has to decide between manually specified files and files produced at a previous step, the tool prioritizes the second option.
Each row in these tables contains the information for one gene tree. It has to contain the columns:
The leaves of a gene tree are gene identifiers or the loss label 'X'. Leaves also have the attribute "species". The species of the genes must be consistent with the species tree. The inner nodes of the gene tree are associated with evolutionary events: the letter "S" indicates a speciation event, while "D" stands for gene duplication. Additionally, for reconciliation, gene and species trees have to include the attribute 'node_ID' for all the nodes.
Furthermore, each file contains additional columns showing information added at each of the steps:
.gene_trees.tsv
Step 3 of REvolutionH-tl constructs gene trees by analysis of best-hit relationships. This file only contains the two columns mentioned above.
.resolved_trees.tsv
Step 4 resolves duplication nodes. This file includes the columns:
.corrected_trees.tsv
Step 6 checks for consistency between gene and species tree before reconciliation. Inconsistent trees are edited by pruning conflicting genes. This file includes the columns:
.reconcilied_trees.tsv
Step 6 reconciles gene and species trees by construction of a reconciliation map. This file contains the column:
Orthologs file This file is an output of step 3, it has the suffix .orthologs.tsv
.
Orthology is stored in a tabular form. Each row of this table contains an orthology relation, i.e. the information of a pair of orthologous genes, described in 6 columns:
Best matches file This file is an output of step 3, it has the suffix .best_matches.tsv
.
Best matches are stored in a tabular form. Each row of this table contains the information of a best-match throughout 5 columns:
Species tree file This file is an output of steps 5 and 6, they have the suffixes .species_tree.tsv
and .labeled_species_tree.nhx
correspondingly.
The species tree is required for step 6. You can provide it using the argument -S <.nhxx file>
.
The species tree has species as leaves. The name of the species must be consistent with the species of the genes in the gene trees. Additionally, the tree output at step 6 has the attribute 'node_id' for every node, specifying a unique identifier for each node of the tree.
RevolutionH-tl is a versatile tool that facilitates various aspects of evolutionary and comparative genomics research. Its outputs are useful for advancing our understanding of the evolution and functional roles of genes across different species. It has several biological applications:
[1] Ramírez-Rafael J. A., Korchmaros A., Aviña-Padilla K., López-Sánchez A., Martinez-Medina G., Hernández-Álvarez A., Hellmuth M., Stadler P. F., and Hernandez-Rosales M. (2025) REvolutionH-tl: A Fast and Robust Tool for Decoding Evolutionary Gene Histories. Submitted.
[2] Ramírez-Rafael J. A., Korchmaros A., Aviña-Padilla K., López-Sánchez A., España-Tinajero A. A., Hellmuth M., Stadler P. F., and Hernandez-Rosales M. (2024) REvolutionH-tl: Reconstruction of Evolutionary Histories tool. In Comparative Genomics: 21st International Conference, RECOMB-CG 2024, Boston, MA, USA, April 27–28, 2024, Proceedings. Springer-Verlag, Berlin, Heidelberg, 89–109. https://doi.org/10.1007/978-3-031-58072-7_5
[3] Buchfink B, Reuter K, Drost HG, "Sensitive protein alignments at tree-of-life scale using DIAMOND", Nature Methods 18, 366–368 (2021). doi:10.1038/s41592-021-01101-x
[4] Fassler J, Cooper P. BLAST Glossary. 2011 Jul 14. In: BLAST® Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2008-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK62051/
FAQs
REvolutionH-tl: Reconstruction of Evolutionary Histories tool
We found that revolutionhtl demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket's Rust support is moving to Beta: all users can scan Cargo projects and generate SBOMs, including Cargo.toml-only crates, with Rust-aware supply chain checks.
Product
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
Security News
Socket CEO Feross Aboukhadijeh joins Risky Business Weekly to unpack recent npm phishing attacks, their limited impact, and the risks if attackers get smarter.