
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Computing and drawing ExP heatmap for displaying complex cross-population data
Welcome to the ExP Heatmap python
package and command-line tool. Our software is focused on displaying multidimensional data, expecially the so called cross-population data - differences/similarities/p-values/or any other parameters of your choice between several groups/populations. Our method allows the user to quickly and efficiently evaluate millions of p-values or test statistics in one figure.
This tool is being developed in the Laboratory of Genomics and Bioinformatics, Institute of Molecular Genetics of the Academy of Sciences of the Czech Republic, v. v. i.
The ExP Heatmap manual is divided into following sections:
Requirements and install
Simple example
Workflow
Command-line tool
Python package usage and examples
Licence and final remarks
This is the ExP heatmap of human lactose (LCT) gene on chromosome 2 and its surrounding genomic region displaying population differences between 26 populations of 1000 Genomes Project, phase 3. Displayed values are the adjusted rank p-values for cross-population extended haplotype homozygosity (XPEHH) selection test.
python
>= 3.8vcftools
for genomic data preparation (not needed if you want just plot your data) (repository)PyPI repository link (exp_heatmap)
pip install exp_heatmap
Install the latest version directly from this GitHub
pip install git+https://github.com/bioinfocz/exp_heatmap.git
After installing the package, try to construct ExP heatmap in three simple steps:
chr2.xpehh.example/
in your working directory: unzip chr2.xpehh.example.zip
exp_heatmap plot chr2.xpehh.example/ --begin 135287850 --end 136287850 --output LCT_xpehh
The exp_heatmap
package will read the files from chr2.xpehh.example/
folder and create the ExP heatmap and save it as LCT_xpehh.png
file.
As an workflow example we present an analysis of 1000 Genomes Project, phase 3 data of chromosome 22, chosen especially for its small size and thus reasonable fast computations. It is focused on ADM2 gene (link), which is active especially in reproductive system, and angiogenesis and cardiovascular system in general.
################
# GET THE DATA #
################
# Download chromosome 22 from 1000genomes ftp
wget "ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/ALL.chr22_GRCh38.genotypes.20170504.vcf.gz" -O chr22.genotypes.vcf.gz
wget "ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/integrated_call_samples_v3.20130502.ALL.panel" -O genotypes.panel
# OR
# The 1000 Genomes Project alternative ftp mirror (GRCh37 version);
wget "https://ddbj.nig.ac.jp/public/mirror_database/1000genomes/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" -O chr22.genotypes.vcf.gz
wget "https://ddbj.nig.ac.jp/public/mirror_database/1000genomes/release/20130502/integrated_call_samples_v3.20130502.ALL.panel" -O genotypes.panel
####################
# PREPARE THE DATA #
####################
# Filter the VCF
vcftools --gzvcf chr22.genotypes.vcf.gz \
--remove-indels \
--recode \
--recode-INFO-all \
--out chr22.genotypes
exp_heatmap prepare chr22.genotypes.recode.vcf chr22.genotypes.recode.zarr
###########################
# COMPUTE PAIRWISE VALUES #
###########################
exp_heatmap compute chr22.genotypes.recode.zarr genotypes.panel chr22.genotypes.output
#######################
# DISPLAY ExP HEATMAP #
#######################
# Plot heatmap
exp_heatmap plot chr22.genotypes.output --begin 50481556 --end 50486440 --title ADM2 --output adm2_GRCh38
# OR
# use this if you used the GRCh37 version of the VCF input files.
exp_heatmap plot chr22.genotypes.output --begin 50910000 --end 50950000 --title ADM2 --output adm2_GRCh37
# The heatmap is saved as adm2_GRCh38.png or adm2_GRCh37.png, depending on which version of plot function are you using.
After installing the exp_heatmap
using pip
as described above, you can use its basic functionality directly from the command line interface.
Extract only SNP
You can use an .vcf or .vcf.gz file
# we will use SNPs only, so we remove insertion/deletion polymorphisms
# another option would be to use only biallelic SNPs (--min-alleles 2 --max-alleles 2),
# probably with minor allele frequency above 5% (--maf 0.05)
# ouput VCF will be named DATA.recode.vcf
DATA="ALL.chr22_GRCh38.genotypes.20170504"
# Gziped VCF
vcftools --gzvcf $DATA.vcf.gz --remove-indels --recode --recode-INFO-all --out $DATA
# Plain VCF
vcftools --vcf $DATA.vcf --remove-indels --recode --recode-INFO-all --out $DATA
Prepare data for computing
# DATA.recode.vcf a vcf from previous step
# DATA.zarr is path (folder) where zarr representation of the VCF input will be saved
# this will vastly increase the speed of follow-up computations
exp_heatmap prepare DATA.recode.vcf DATA.zarr
# DATA.zarr a zarr data from previous step
# DATA.output a path (folder) where the results will be saved
# in this step, by default Cross-population extended haplotype homozygosity (XPEHH) score will be computed for all positions, together with their -log10 rank p-values.
exp_heatmap compute DATA.zarr genotypes.panel DATA.output
Besides the default cross-population extended haplotype homozygosity (XPEHH) test, you can use this exp_heatmap compute
with optional parameter -t
and one of the keywords:
xpehh
- computes cross-population extended haplotype homozygosity (XPEHH) test (default),xpnsl
- computes cross-population number of segregating sites by length (NSL) test,delta_tajima_d
- computes delta Tajima's D,hudson_fst
- computes pairwise genetic distance Fst (using the method od Hudson (1992)).# computing the XP-NSL test
exp_heatmap compute DATA.zarr genotypes.panel DATA.output -t xpnsl
--begin
, --end
(required)
--title
(optional)
--cmap
(optional)
--output
(optional)
exp_heatmap plot DATA.output --begin BEING --end END --title TITLE --output NAME
Besides using ExP Heatmap as standalone command-line tool, more options and user-defined parameters' changes are available when ExP Heatmap is imported directly into your Python script.
Test files used in these examples (p-values, test results, VCF files etc.) can be downloaded HERE. They are based on results of cross-population selection tests of the lactase (LCT) gene area ( chr2:135,787,850-135,837,184).
Here we outline a solution to 3 possible and most common scenarios where the ExP is being used. Possible model scenarios:
Your data are in a *.tsv file, tab-delimited text file (table), where the results or p-values are stored in columns, first column is 'CHROM', second column 'POS', followed by X columns of pairwise parameters (i.e. rank p-values). For 1000 Genomes data, that would mean 325 columns of pair-wise p-values for 26 populations.
from exp_heatmap.plot import plot_exp_heatmap
import pandas as pd
# input data in the form of pandas DataFrame, expected shape (x, 327)
# 327 columns consisting of CHROM, POS and 325 columns of pairwise p-values
# x represents the number of SNPs to display
# column names are expected to include the 1000 Genomes population abbreviations
data_to_plot = pd.read_csv("LCT_xpnsl_pvalues.csv", sep="\t")
plot_exp_heatmap(data_to_plot,
begin=135287850,
end=136287850,
title="XP-NSL test on LCT gene in 1000 Genomes Project (phase 3) populations",
cmap="Blues",
output=False, # enter the save file name here
populations="1000Genomes",
xlabel="LCT gene, 1 Mbp window, 2:135,287,850-136,287,850, GRCh38")
from exp_heatmap.plot import plot_exp_heatmap, create_plot_input, superpopulations, prepare_cbar_params
# input data are in the form of pairwise population results per file
# here, the results of XP-NSL test for populations of 1000 Genome Project dataset
results_directory = "chr2_xpnsl_1000Genomes.test/"
# compute ranked p-values and prepare data for ExP heatmap
data_to_plot = create_plot_input("chr2_xpnsl_1000Genomes.test/", begin=135287850, end=136287850, populations="1000Genomes")
plot_exp_heatmap(data_to_plot,
begin=135287850,
end=136287850,
title="XP-NSL test on LCT gene in 1000 Genomes Project (phase 3) populations",
cmap="Blues",
output=False, # enter the save file name here
populations="1000Genomes",
xlabel="LCT gene, 1 Mbp window, 2:135,287,850-136,287,850, GRCh38")
#######################################################################################
# you can tweak different paramaters in the ExP heatmap plot
# prepare custom colorbar parameters
cmin, cmax, cbar_ticks = prepare_cbar_params(data_to_plot, n_cbar_ticks=4)
# display custom population set
plot_exp_heatmap(data_to_plot,
begin=135000000,
end=137000000,
title="XP-NSL test results in African populations",
cmap="expheatmap", # custom heatmap
output="xpnsl_Africa", # save results
vertical_line=([135851073, "rs41525747"], [135851081, "rs41380347"], [135851176, "rs145946881"]), # 3 vertical lines marking SNPs with described selection pressure (https://doi.org/10.1093/gbe/evab065)
populations=superpopulations["AFR"], # custom population set
xlabel="LCT gene region, 2:135,000,000-137,000,000, GRCh38")
XX VCF to zarr
XX Compute test (different!)
XX Display!
XX
XX
The ExP Heatmap package is available under the MIT License. (link)
If you would be interested in using this method in your commercial software under another licence, please, contact us at edvard.ehler@img.cas.cz.
FAQs
Computing and drawing ExP heatmap for displaying complex cross-population data
We found that exp-heatmap demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.