In order to install VAIRO and its interface VAIROGUI, you need to run the installer script located in tools/install_vairo.sh. This script handles conda setup and installs all VAIRO dependencies within a dedicated environment.

Execute the installer script:

bash tools/install_vairo.sh

The script will:

Check for an existing conda installation and install it if missing.
Create and activate a conda environment for VAIRO.
Install all Python and system dependencies required by VAIRO.
Verify that system libraries like CUDA drivers or MAXIT are already present.

Usage

To run the command-line program:

vairo [-h] [-check] <config.yaml>

Flag	Description
`-h`	Show help and exit
`-check`	Validate configuration (.yml) file parsing

To launch the graphical interface:

vairogui

Configuration File (YAML)

The configuration file must be in valid YAML. Below are all supported sections and parameters.

1. Mandatory keys

mode (string) Choose one of: naive, guided.
output_dir (string) Directory where results will be saved.
af2_dbs_path (string) Path to the AlphaFold2 databases (must be pre-downloaded).

2. Common optional keys

run_dir (string, default: "run") Directory where AlphaFold2 jobs will run.
glycines (integer, default: 50) Number of glycine residues to insert between concatenated sequences.
small_bfd (boolean, default: false) Use reduced BFD library.
run_af2 (boolean, default: true) Run AlphaFold2 (otherwise stop after generating features.pkl file).
stop_after_msa (boolean, default: false) Run AlphaFold2 up to MSA generation, then exit.
reference (string, default: "") PDB ID or path to PDB file to be used as global reference.
experimental_pdbs (list of strings, default: []) List of PDB IDs or paths to PDB files for result comparison.
mosaic (integer, default: null) Split the sequence into X partitions.
mosaic_partition (range, default: null) Residue based partitioning.
mosaic_seq_partition (range, default: null) Sequence numbering partitioning.
cluster_templates (boolean, default: false - becomes true if mode: naive) Cluster templates from preprocessed features.pkl.
cluster_templates_msa (integer, default: -1) Number of sequences to add to the MSA (-1 = all).
cluster_templates_msa_mask (sequence range, default: null) Remove specific residues from MSA sequences.
cluster_templates_sequence (string path, default: null) Replace templates sequences using FASTA at given path.
show_pymol (string, default: null) Pymol selection string (comma-separated regions) to zoom into.

3. Query sequence

Define one or more sequences to generate the query sequence. All sequences will be concatenated using glycine linkers.

sequences:
    - fasta_path (string, mandatory) Path to the FASTA file.
      num_of_copies (integer, default: 1) Number of copies of the sequence.
      positions (list of integers, default: [], any position) Insertion position in the query.
      name (string, default: file name from fasta_path) Sequence name.
      predict_region (range, default: null) Predict only this subsequence instead of the full length.
      mutations (map) Map three-letter amino acid codes to residue indices. Example:
        - 'ALA': 10, 20

4. Add templates

Customize PDB templates for insertion into features.pkl.

templates:
    - pdb (string, mandatory) Path to a PDB file or existing PDB ID.
      add_to_msa (boolean, default: false) Add the template’s sequence to the MSA.
      add_to_templates (boolean, default: true) Include the template in features.pkl.
      generate_multimer (boolean, default: true) Generate a multimeric assembly from the PDB.
      strict (boolean, default: true) Discard templates with E-values below threshold.
      aligned (boolean, default: false) Skip alignment if already aligned.
      legacy (boolean, default: false) Use pre-aligned, single-chain template for the full query.
      reference (string, default: null) Reference to be used in order to insert it into the query sequence.
      modifications (List) Chain-level edits before/after alignment. Each modification can include:
         - chain (string, mandatory) chain ID or All.
           position (integer, default: null) Insertion position in query (if single chain).
           maintain_residues (list of integers, default: null) Selected residues will be kept, and the rest will be deleted.
           delete_residues (list of integers, default: null) Selected residues will be deleted, the rest will be kept.
           when (string, default: after_alignment) before_alignment or after_alignment.
           mutations (List) Modifications in the residues:
              - numbering_residues (list of integers, mandatory) Residue positions where the mutations will be applied.
                mutate_with (string, mandatory) The amino acid to mutate to, specified as a three‑letter code or as a FASTA file path.

5. Add features

Merge or slice existing features.pkl files from other AlphaFold2 runs into your run.

features:
    - path (string, mandatory) Path to an existing features.pkl file.
      keep_msa (integer, default: -1) -1 = all sequences; otherwise top X by coverage.
      keep_templates (integer, default: -1) -1 = all templates; otherwise top X by coverage.
      msa_mask (range, default: null) Remove this residue range from the MSA.
      sequence (string, default: null) FASTA file to replace all template sequences.
      numbering_query (list of integers, default: null) Insertion positions in the query sequence.
      numbering_features (list of ranges, default: null) Map feature blocks into the positions given by numbering_query.
      positions (range, default: null) Inserts the features.pkl into the query sequence. The position refers to the sequence index, whereas in numbering_query and numbering_features, it refers to the residue positions in the entire query sequence.
      mutations (map) Map three-letter amino acid codes to residue indices. Example:
        - 'ALA': 10, 20

6. Append library

Append existing FASTA/PDB files from a library into your run.

append_library:
    - path: (string, mandatory) Path to a directory, PDB, or FASTA file.
      add_to_msa (boolean, default: true) Append sequences to the MSA.
      add_to_templates (boolean, default: false) Append PDBs to the templates.
      numbering_query (list of integers, default: null) Insertion positions in the query.
      numbering_library (list of ranges, default: null) Residue range from the library entry to insert.

7. Configuration file example

mode: guided
output_dir: /path/to/output
af2_dbs_path: /path/to/af2_dbs
run_af2: True
experimental_pdbs: /path/to/references/experimental.pdb

sequences:
- fasta_path: /path/to/data/seq1.fasta
  num_of_copies: 1
- fasta_path: /path/to/data/seq2.fasta
  num_of_copies: 1
- fasta_path: /path/to/data/seq3.fasta
  num_of_copies: 1
- fasta_path: /path/to/data/seq4.fasta
  num_of_copies: 1

features:
- path: /path/to/features1.pkl
  keep_msa: 30
  keep_templates: 0
  numbering_query: 1

- path: /path/to/features2.pkl
  keep_msa: 30
  keep_templates: 0
  msa_mask: 276-477, 652-857
  numbering_query: 1

- path: /path/to/features3.pkl
  keep_msa: 30
  keep_templates: 0
  msa_mask: 8-250
  numbering_query: 4

templates:
- pdb: /path/to/templates/template.pdb
  add_to_msa: true
  add_to_templates: True
  generate_multimer: False
  aligned: true
  modifications:
  - chain: A
    position: 1
    mutations:
    - numbering_residues: 276-477
      mutate_with: /path/to/data/seq1.fasta

Output information

All information is located in the output_dir directory, which is specified as an input parameter in the configuration file. Inside output_dir, you will find the following folders and files:

output.html: Contains the results in HTML format, including all plots, run statistics, and prediction analyses.
output.log: The log file with detailed information from the execution.
plots/: All plots generated by the output analysis.
frobenius/: Plots generated by ALEPH.
interfaces/: Results of the interface analysis performed by PISA.
clustering/: (If clustering is enabled) Contains the results related to clustering jobs.
input/: All input files used in the run.
run/: Stores runtime information and outputs (see below for details).
templates/: Templates extracted from the features.pkl, split by chains.
rankeds: Ranked models generated by AlphaFold2, split by chains.

Inside the run/ directory, you will find:

results: Results of the AlphaFold2 run (see below for details).
Templates folder: Subfolders named after each template, containing the databases generated to align each template.
Sequences folder: Subfolders named after each sequence, containing alignments of the templates with the corresponding sequence.

Inside the run/results/ directory, you will find:

tmp/: Contains intermediate files generated by external programs (e.g., Aleph).
ccanalysis/ and ccanalysis_ranked/: PDB files used for the cc_analysis run.
msas/: Information generated by AlphaFold2. It contains the extracted sequences and the template alignments.
templates_nonsplit/: Templates extracted from features.pkl, not split by chains.
rankeds_split/: Ranked models generated by AlphaFold2, split by chains.
rankeds/: Ranked models generated by AlphaFold2, not split by chains.

Keywords

crystallography macromolecular

FAQs

What is vairo?

Is vairo well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

vairo

VAIRO

Prerequisites

Installation