Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

bioinfokit

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

bioinfokit

Bioinformatics data analysis and visualization toolkit

  • 2.1.4
  • PyPI
  • Socket score

Maintainers
1

DOI PyPI version Downloads Build Status Anaconda-Server Badge

The bioinfokit toolkit aimed to provide various easy-to-use functionalities to analyze,
visualize, and interpret the biological data generated from genome-scale omics experiments.

How to install:

bioinfokit requires

  • Python 3
  • NumPy
  • scikit-learn
  • seaborn
  • pandas
  • matplotlib
  • SciPy
  • matplotlib_venn

bioinfokit can be installed using pip, easy_install and git.

latest bioinfokit version: PyPI version

Install using pip for Python 3 (easiest way)

# install
pip install bioinfokit

# upgrade to latest version
pip install bioinfokit --upgrade

# uninstall 
pip uninstall bioinfokit

Install using easy_install for Python 3 (easiest way)

# install latest version
easy_install bioinfokit

# specific version
easy_install bioinfokit==0.3

# uninstall 
pip uninstall bioinfokit

Install using conda

conda install -c bioconda bioinfokit

Install using git

# download and install bioinfokit (Tested on Linux, Mac, Windows) 
git clone https://github.com/reneshbedre/bioinfokit.git
cd bioinfokit
python setup.py install

Check the version of bioinfokit

>>> import bioinfokit
>>> bioinfokit.__version__
'0.4'

How to cite bioinfokit?

  • Renesh Bedre. (2020, March 5). reneshbedre/bioinfokit: Bioinformatics data analysis and visualization toolkit. Zenodo. http://doi.org/10.5281/zenodo.3698145.
  • Additionally check Zenodo to cite specific version of bioinfokit

Support

If you enjoy bioinfokit, consider supporting me,

Buy Me A Coffee

Getting Started

Gene expression analysis

Volcano plot

latest update v2.0.8

bioinfokit.visuz.GeneExpression.volcano(df, lfc, pv, lfc_thr, pv_thr, color, valpha, geneid, genenames, gfont, dim, r, ar, dotsize, markerdot, sign_line, gstyle, show, figtype, axtickfontsize, axtickfontname, axlabelfontsize, axlabelfontname, axxlabel, axylabel, xlm, ylm, plotlegend, legendpos, figname, legendanchor, legendlabels, theme)

ParametersDescription
dfPandas dataframe table having atleast gene IDs, log fold change, P-values or adjusted P-values columns
lfcName of a column having log or absolute fold change values [string][default:logFC]
pvName of a column having P-values or adjusted P-values [string][default:p_values]
lfc_thrLog fold change cutoff for up and downregulated genes [Tuple or list][default:(1.0, 1.0)]
pv_thrp value or adjusted p value cutoff for up and downregulated genes [Tuple or list][default:(0.05, 0.05)]
colorTuple of three colors [Tuple or list][default: color=("green", "grey", "red")]
valphaTransparency of points on volcano plot [float (between 0 and 1)][default: 1.0]
geneidName of a column having gene Ids. This is necessary for plotting gene label on the points [string][default: None]
genenamesTuple of gene Ids to label the points. The gene Ids must be present in the geneid column. If this option set to "deg" it will label all genes defined by lfc_thr and pv_thr [string, tuple, dict][default: None]
gfontFont size for genenames [float][default: 10.0]. gfont not compatible with gstyle=2.
dimFigure size [Tuple of two floats (width, height) in inches][default: (5, 5)]
rFigure resolution in dpi [int][default: 300]. Not compatible with show= True
arRotation of X and Y-axis ticks labels [float][default: 90]
dotsizeThe size of the dots in the plot [float][default: 8]
markerdotShape of the dot marker. See more options at https://matplotlib.org/3.1.1/api/markers_api.html [string][default: "o"]
sign_lineShow grid lines on plot with defined log fold change (lfc_thr) and P-value (pv_thr) threshold value [True or False][default:False]
gstyleStyle of the text for genenames. 1 for default text and 2 for box text [int][default: 1]
showShow the figure on console instead of saving in current folder [True or False][default:False]
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
axtickfontsizeFont size for axis ticks [float][default: 9]
axtickfontnameFont name for axis ticks [string][default: 'Arial']
axlabelfontsizeFont size for axis labels [float][default: 9]
axlabelfontnameFont name for axis labels [string][default: 'Arial']
axxlabelLabel for X-axis. If you provide this option, default label will be replaced [string][default: None]
axylabelLabel for Y-axis. If you provide this option, default label will be replaced [string][default: None]
xlmRange of ticks to plot on X-axis [float (left, right, interval)][default: None]
ylmRange of ticks to plot on Y-axis [float (bottom, top, interval)][default: None]
plotlegendplot legend on volcano plot [True or False][default:False]
legendposposition of the legend on plot. For more options see loc parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [string ][default:"best"]
fignamename of figure [string ][default:"volcano"]
legendanchorposition of the legend outside of the plot. For more options see bbox_to_anchor parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [list][default:None]
legendlabelslegend label names. If you provide custom label names keep the same order of label names as default [list][default:['significant up', 'not significant', 'significant down']]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

Volcano plot image in same directory (volcano.png) Working example

Inverted Volcano plot

latest update v2.0.8

bioinfokit.visuz.GeneExpression.involcano(table, lfc, pv, lfc_thr, pv_thr, color, valpha, geneid, genenames, gfont, gstyle, dotsize, markerdot, r, dim, show, figtype, axxlabel, axylabel, axlabelfontsize, axtickfontsize, axtickfontname, plotlegend, legendpos, legendanchor, figname, legendlabels, ar, theme)

ParametersDescription
tablePandas dataframe table having atleast gene IDs, log fold change, P-values or adjusted P-values
lfcName of a column having log fold change values [default:logFC]
pvName of a column having P-values or adjusted P-values [default:p_values]
lfc_thrLog fold change cutoff for up and downregulated genes [Tuple or list] [default:(1.0, 1.0)]
pv_thrp value or adjusted p value cutoff for up and downregulated genes [Tuple or list] [default:(0.05, 0.05)]
colorTuple of three colors [Tuple or list][default: color=("green", "grey", "red")]
valphaTransparency of points on volcano plot [float (between 0 and 1)][default: 1.0]
geneidName of a column having gene Ids. This is necessary for plotting gene label on the points [string][default: None]
genenamesTuple of gene Ids to label the points. The gene Ids must be present in the geneid column. If this option set to "deg" it will label all genes defined by lfc_thr and pv_thr [string, Tuple, dict][default: None]
gfontFont size for genenames [float][default: 10.0]
gstyleStyle of the text for genenames. 1 for default text and 2 for box text [int][default: 1]
dotsizeThe size of the dots in the plot [float][default: 8]
markerdotShape of the dot marker. See more options at https://matplotlib.org/3.1.1/api/markers_api.html [string][default: "o"]
dimFigure size [Tuple of two floats (width, height) in inches][default: (5, 5)]
rFigure resolution in dpi [int][default: 300]. Not compatible with show= True
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
showShow the figure on console instead of saving in current folder [True or False][default:False]
axxlabelLabel for X-axis. If you provide this option, default label will be replaced [string][default: None]
axylabelLabel for Y-axis. If you provide this option, default label will be replaced [string][default: None]
axlabelfontsizeFont size for axis labels [float][default: 9]
axtickfontsizeFont size for axis ticks [float][default: 9]
axtickfontnameFont name for axis ticks [string][default: 'Arial']
plotlegendplot legend on inverted volcano plot [True or False][default:False]
legendposposition of the legend on plot. For more options see loc parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [string ][default:"best"]
legendanchorposition of the legend outside of the plot. For more options see bbox_to_anchor parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [list][default:None]
fignamename of figure [string ][default:"involcano"]
legendlabelslegend label names. If you provide custom label names keep the same order of label names as default [list][default:['significant up', 'not significant', 'significant down']]
arRotation of X and Y-axis ticks labels [float][default: 90]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

Inverted volcano plot image in same directory (involcano.png) Working example

MA plot

latest update v2.0.7

bioinfokit.visuz.GeneExpression.ma(df, lfc, ct_count, st_count, pv, basemean, lfc_thr, color, dim, dotsize, show, r, valpha, figtype, axxlabel, axylabel, axlabelfontsize, axtickfontsize, axtickfontname, xlm, ylm, fclines, fclinescolor, legendpos, legendanchor, figname, legendlabels, plotlegend, ar, theme, geneid, genenames, gfont, gstyle, title)

ParametersDescription
dfPandas dataframe table having atleast gene IDs, log fold change, and normalized counts (control and treatment) columns
lfcName of a column having log fold change values [default:"logFC"]
ct_countName of a column having count values for control sample.Ignored if basemean provided [default:"value1"]
st_countName of a column having count values for treatment sample. Ignored if basemean provided [default:"value2"]
pvName of a column having p values or adjusted p values
basemeanBasemean (mean of normalized counts) from DESeq2 results
lfc_thrLog fold change cutoff for up and downregulated genes [Tuple or list][default:(1.0, 1.0)]
colorTuple of three colors [Tuple or list][default: ("green", "grey", "red")]
dotsizeThe size of the dots in the plot [float][default: 8]
markerdotShape of the dot marker. See more options at https://matplotlib.org/3.1.1/api/markers_api.html [string][default: "o"]
valphaTransparency of points on plot [float (between 0 and 1)][default: 1.0]
dimFigure size [Tuple of two floats (width, height) in inches][default: (5, 5)]
rFigure resolution in dpi [int][default: 300]. Not compatible with show= True
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
showShow the figure on console instead of saving in current folder [True or False][default:False]
axxlabelLabel for X-axis. If you provide this option, default label will be replaced [string][default: None]
axylabelLabel for Y-axis. If you provide this option, default label will be replaced [string][default: None]
axlabelfontsizeFont size for axis labels [float][default: 9]
axtickfontsizeFont size for axis ticks [float][default: 9]
axtickfontnameFont name for axis ticks [string][default: 'Arial']
xlmRange of ticks to plot on X-axis [float (left, right, interval)][default: None]
ylmRange of ticks to plot on Y-axis [float (bottom, top, interval)][default: None]
fclinesdraw log fold change threshold lines as defines by lfc [True or False][default:False]
fclinescolorcolor of fclines [string][default: '#2660a4']
plotlegendplot legend on MA plot [True or False][default:False]
legendposposition of the legend on plot. For more options see loc parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [string ][default:"best"]
legendanchorposition of the legend outside of the plot. For more options see bbox_to_anchor parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [list][default:None]
fignamename of figure [string ][default:"ma"]
legendlabelslegend label names. If you provide custom label names keep the same order of label names as default [list][default:['significant up', 'not significant', 'significant down']]
arRotation of X and Y-axis ticks labels [float][default: 90]
themeChange background theme. If theme set to dark_background, the dark background will be produced instead of default white. See more themes here [string][default:'None']
geneidName of a column having gene Ids. This is necessary for plotting gene label on the points [string][default: None]
genenamesTuple of gene Ids to label the points. The gene Ids must be present in the geneid column. If this option set to "deg" it will label all genes defined by lfc_thr and pv_thr [string, Tuple, dict][default: None]
gfontFont size for genenames [float][default: 10.0]
gstyleStyle of the text for genenames. 1 for default text and 2 for box text [int][default: 1]
titleAdd main title to the plot [string][default: None]

Returns:

MA plot image in same directory (ma.png)

Working example

Heatmap

latest update v2.0.1

bioinfokit.visuz.gene_exp.hmap(table, cmap='seismic', scale=True, dim=(6, 8), rowclus=True, colclus=True, zscore=None, xlabel=True, ylabel=True, tickfont=(12, 12), show, r, figtype, figname, theme)

ParametersDescription
fileCSV delimited data file. It should not have NA or missing values
cmapColor Palette for heatmap [string][default: 'seismic']
scaleDraw a color key with heatmap [boolean (True or False)][default: True]
dimheatmap figure size [Tuple of two floats (width, height) in inches][default: (6, 8)]
rowclusDraw hierarchical clustering for rows [boolean (True or False)][default: True]
colclusDraw hierarchical clustering for columns [boolean (True or False)][default: True]
zscoreZ-score standardization of row (0) or column (1). It works when clus is True. [None, 0, 1][default: None]
xlabelPlot X-label [boolean (True or False)][default: True]
ylabelPlot Y-label [boolean (True or False)][default: True]
tickfontFontsize for X and Y-axis tick labels [Tuple of two floats][default: (14, 14)]
showShow the figure on console instead of saving in current folder [True or False][default:False]
rFigure resolution in dpi [int][default: 300]. Not compatible with show= True
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
fignamename of figure [string ][default:"heatmap"]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

heatmap plot (heatmap.png, heatmap_clus.png)

Working example

Clustering analysis

Scree plot

latest update v2.0.1

bioinfokit.visuz.cluster.screeplot(obj, axlabelfontsize, axlabelfontname, axxlabel, axylabel, figtype, r, show, dim, theme)

ParametersDescription
objlist of component name and component variance
axlabelfontsizeFont size for axis labels [float][default: 9]
axlabelfontnameFont name for axis labels [string][default: 'Arial']
axxlabelLabel for X-axis. If you provide this option, default label will be replaced [string][default: None]
axylabelLabel for Y-axis. If you provide this option, default label will be replaced [string][default: None]
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
rFigure resolution in dpi [int][default: 300]
showShow the figure on console instead of saving in current folder [True or False][default:False]
dimFigure size [Tuple of two floats (width, height) in inches][default: (6, 4)]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

Scree plot image (screeplot.png will be saved in same directory)

Working Example

Principal component analysis (PCA) loadings plots

latest update v2.0.1

bioinfokit.visuz.cluster.pcaplot(x, y, z, labels, var1, var2, var3, axlabelfontsize, axlabelfontname, figtype, r, show, plotlabels, dim, theme)

ParametersDescription
xloadings (correlation coefficient) for principal component 1 (PC1)
yloadings (correlation coefficient) for principal component 2 (PC2)
zloadings (correlation coefficient) for principal component 3 (PC2)
labelsoriginal variables labels from dataframe used for PCA
var1Proportion of PC1 variance [float (0 to 1)]
var2Proportion of PC2 variance [float (0 to 1)]
var3Proportion of PC3 variance [float (0 to 1)]
axlabelfontsizeFont size for axis labels [float][default: 9]
axlabelfontnameFont name for axis labels [string][default: 'Arial']
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
rFigure resolution in dpi [int][default: 300]
showShow the figure on console instead of saving in current folder [True or False][default:False]
plotlabelsPlot labels as defined by labels parameter [True or False][default:True]
dimFigure size [Tuple of two floats (width, height) in inches][default: (6, 4)]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

PCA loadings plot 2D and 3D image (pcaplot_2d.png and pcaplot_3d.png will be saved in same directory)

Working Example

Principal component analysis (PCA) biplots

latest update v2.0.2

bioinfokit.visuz.cluster.biplot(cscore, loadings, labels, var1, var2, var3, axlabelfontsize, axlabelfontname, figtype, r, show, markerdot, dotsize, valphadot, colordot, arrowcolor, valphaarrow, arrowlinestyle, arrowlinewidth, centerlines, colorlist, legendpos, datapoints, dim, theme)

ParametersDescription
cscoreprincipal component scores (obtained from PCA().fit_transfrom() function in sklearn.decomposition)
loadingsloadings (correlation coefficient) for principal components
labelsoriginal variables labels from dataframe used for PCA
var1Proportion of PC1 variance [float (0 to 1)]
var2Proportion of PC2 variance [float (0 to 1)]
var3Proportion of PC3 variance [float (0 to 1)]
axlabelfontsizeFont size for axis labels [float][default: 9]
axlabelfontnameFont name for axis labels [string][default: 'Arial']
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
rFigure resolution in dpi [int][default: 300]
showShow the figure on console instead of saving in current folder [True or False][default:False]
markerdotShape of the dot on plot. See more options at https://matplotlib.org/3.1.1/api/markers_api.html [string][default: "o"]
dotsizeThe size of the dots in the plot [float][default: 6]
valphadotTransparency of dots on plot [float (between 0 and 1)][default: 1]
colordotColor of dots on plot [string or list ][default:"#4a4e4d"]
arrowcolorColor of the arrow [string ][default:"#fe8a71"]
valphaarrowTransparency of the arrow [float (between 0 and 1)][default: 1]
arrowlinestyleline style of the arrow. check more styles at https://matplotlib.org/3.1.0/gallery/lines_bars_and_markers/linestyles.html [string][default: '-']
arrowlinewidthline width of the arrow [float][default: 1.0]
centerlinesdraw center lines at x=0 and y=0 for 2D plot [bool (True or False)][default: True]
colorlistlist of the categories to assign the color [list][default:None]
legendposposition of the legend on plot. For more options see loc parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [string ][default:"best"]
datapointsplot data points on graph [bool (True or False)][default: True]
dimFigure size [Tuple of two floats (width, height) in inches][default: (6, 4)]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

PCA biplot 2D and 3D image (biplot_2d.png and biplot_3d.png will be saved in same directory)

Working Example

t-SNE plot

latest update v2.0.1

bioinfokit.visuz.cluster.tsneplot(score, colorlist, axlabelfontsize, axlabelfontname, figtype, r, show, markerdot, dotsize, valphadot, colordot, dim, figname, legendpos, legendanchor, theme)

ParametersDescription
scoret-SNE component embeddings (obtained from TSNE().fit_transfrom() function in sklearn.manifold)
colorlistlist of the categories to assign the color [list][default:None]
axlabelfontsizeFont size for axis labels [float][default: 9]
axlabelfontnameFont name for axis labels [string][default: 'Arial']
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
rFigure resolution in dpi [int][default: 300]
showShow the figure on console instead of saving in current folder [True or False][default:False]
markerdotShape of the dot on plot. See more options at https://matplotlib.org/3.1.1/api/markers_api.html [string][default: "o"]
dotsizeThe size of the dots in the plot [float][default: 6]
valphadotTransparency of dots on plot [float (between 0 and 1)][default: 1]
colordotColor of dots on plot [string or list ][default:"#4a4e4d"]
legendposposition of the legend on plot. For more options see loc parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [string ][default:"best"]
legendanchorposition of the legend outside of the plot. For more options see bbox_to_anchor parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [list][default:None]
dimFigure size [Tuple of two floats (width, height) in inches][default: (6, 4)]
fignamename of figure [string ][default:"tsne_2d"]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

t-SNE 2D image (tsne_2d.png will be saved in same directory)

Working Example

Normalization

RPM or CPM normalization

latest update v0.8.9

Normalize raw gene expression counts into Reads per million mapped reads (RPM) or Counts per million mapped reads (CPM)

bioinfokit.analys.norm.cpm(df)

ParametersDescription
dfPandas dataframe containing raw gene expression values. Genes with missing expression values (NA) will be dropped.

Returns:

RPM or CPM normalized Pandas dataframe as class attributes (cpm_norm)

Working Example

RPKM or FPKM normalization

latest update v0.9

Normalize raw gene expression counts into Reads per kilo base per million mapped reads (RPKM) or Fragments per kilo base per million mapped reads (FPKM)

bioinfokit.analys.norm.rpkm(df, gl)

ParametersDescription
dfPandas dataframe containing raw gene expression values. Genes with missing expression or gene length values (NA) will be dropped.
glName of a column having gene length in bp [string][default: None]

Returns:

RPKM or FPKM normalized Pandas dataframe as class attributes (rpkm_norm)

Working Example

TPM normalization

latest update v0.9.1

Normalize raw gene expression counts into Transcript per million (TPM)

bioinfokit.analys.norm.tpm(df, gl)

ParametersDescription
dfPandas dataframe containing raw gene expression values. Genes with missing expression or gene length values (NA) will be dropped.
glName of a column having gene length in bp [string][default: None]

Returns:

TPM normalized Pandas dataframe as class attributes (tpm_norm)

Working Example

Variant analysis

Manhattan plot

latest update v2.0.1

bioinfokit.visuz.marker.mhat(df, chr, pv, log_scale, color, dim, r, ar, gwas_sign_line, gwasp, dotsize, markeridcol, markernames, gfont, valpha, show, figtype, axxlabel, axylabel, axlabelfontsize, ylm, gstyle, figname, theme)

ParametersDescription
dfPandas dataframe object with atleast SNP, chromosome, and P-values columns
chrName of a column having chromosome numbers [string][default:None]
pvName of a column having P-values. Must be numeric column [string][default:None]
log_scaleChange the values provided in pv column to minus log10 scale. If set to False, the original values in pv will be used. This is useful in case of Fst values. [Boolean (True or False)][default:True]
colorList the name of the colors to be plotted. It can accept two alternate colors or the number colors equal to chromosome number. If nothing (None) provided, it will randomly assign the color to each chromosome [list][default:None]
gwas_sign_linePlot statistical significant threshold line defined by option gwasp [Boolean (True or False)][default: False]
gwaspStatistical significant threshold to identify significant SNPs [float][default: 5E-08]
dotsizeThe size of the dots in the plot [float][default: 8]
markeridcolName of a column having SNPs. This is necessary for plotting SNP names on the plot [string][default: None]
markernamesThe list of the SNPs to display on the plot. These SNP should be present in SNP column. Additionally, it also accepts the dict of SNPs and its associated gene name. If this option set to True, it will label all SNPs with P-value significant score defined by gwasp [string, list, Tuple, dict][default: True]
gfontFont size for SNP names to display on the plot [float][default: 8]. gfont not compatible with gstyle=2.
valphaTransparency of points on plot [float (between 0 and 1)][default: 1.0]
dimFigure size [Tuple of two floats (width, height) in inches][default: (6, 4)]
rFigure resolution in dpi [int][default: 300]
arRotation of X-axis labels [float][default: 90]
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
showShow the figure on console instead of saving in current folder [Boolean (True or False)][default:False]
axxlabelLabel for X-axis. If you provide this option, default label will be replaced [string][default: None]
axylabelLabel for Y-axis. If you provide this option, default label will be replaced [string][default: None]
axlabelfontsizeFont size for axis labels [float][default: 9]
ylmRange of ticks to plot on Y-axis [float Tuple (bottom, top, interval)][default: None]
gstyleStyle of the text for markernames. 1 for default text and 2 for box text [int][default: 1]
fignamename of figure [string][default:"manhattan"]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

Manhattan plot image in same directory (Manhattan.png)

Working example

Variant annotation

latest update v0.9.3

Assign genetic features and function to the variants in VCF file

bioinfokit.analys.marker.vcf_anot(file, id, gff_file, anot_attr)

ParametersDescription
fileVCF file
idchromosome id column in VCF file [string][default='#CHROM']
gff_fileGFF3 genome annotation file
anot_attrGene function tag in attributes field of GFF3 file

Returns:

Tab-delimited text file with annotation (annotated text file will be saved in same directory)

Working Example

Concatenate VCF files

latest update v0.9.4

Concatenate multiple VCF files into single VCF file (for example, VCF files for each chromosome)

bioinfokit.analys.marker.concatvcf(file)

ParametersDescription
fileMultiple vcf files separated by comma

Returns:

Concatenated VCF file (concat_vcf.vcf)

Working example

Split VCF file

bioinfokit.analys.marker.splitvcf(file)

Split single VCF file containing variants for all chromosomes into individual file containing variants for each chromosome

ParametersDescription
fileVCF file to split
idchromosome id column in VCF file [string][default='#CHROM']

Returns:

VCF files for each chromosome

Working example

High-throughput sequence analysis

FASTQ batch downloads from SRA database

latest update v0.9.7

bioinfokit.analys.fastq.sra_bd(file, t, other_opts)

FASTQ files will be downloaded using fasterq-dump. Make sure you have the latest version of the NCBI SRA toolkit (version 2.10.8) is installed and binaries are added to the system path

ParametersDescription
fileList of SRA accessions for batch download. All accession must be separated by a newline in the file.
tNumber of threads for parallel run [int][default=4]
other_optsProvide other relevant options for fasterq-dump [str][default=None]
Provide the options as a space-separated string. You can get a detailed option for fasterq-dump using the -help option.

Returns:

FASTQ files for each SRA accession in the current directory unless specified by other_opts

Description and working example

FASTQ quality format detection

bioinfokit.analys.format.fq_qual_var(file)

ParametersDescription
fileFASTQ file to detect quality format [deafult: None]

Returns:

Quality format encoding name for FASTQ file (Supports only Sanger, Illumina 1.8+ and Illumina 1.3/1.4)

Working Example

Sequencing coverage

latest update v0.9.7

bioinfokit.analys.fastq.seqcov(file, gs)

ParametersDescription
fileFASTQ file
gsGenome size in Mbp

Returns:

Sequencing coverage of the given FASTQ file

Description and Working example

Split the sequence into smaller subsequences

latest update v2.0.6

bioinfokit.analys.Fasta.split_seq(seq, seq_size, seq_overlap, any_cond, outfmt)

ParametersDescription
seqInput sequence [string]
seq_sizesubsequence size [int][default: 3]
seq_overlapSplit the sequence in overlap mode [bool][default: True]
any_condSplit sequence based on a condition. Note yet defined.
outfmtOutput format for the subsequences. If parameter set to 'fasta', the file will be saved in same folder with name output_chunks.fasta ['list' or 'fasta'][default: 'list']

Returns:

Subsequences in list or fasta file (output_chunks.fasta) format

Description and Working example

Reverse complement of DNA sequence

latest update v2.1.1

bioinfokit.analys.Fasta.rev_com(sequence)

ParametersDescription
seqDNA sequence to perform reverse complement
fileDNA sequence in a fasta file

Returns:

Reverse complement of original DNA sequence

Working example

File format conversions

bioinfokit.analys.format

FunctionParametersDescription
bioinfokit.analys.format.fqtofa(file)FASTQ fileConvert FASTQ file into FASTA format
bioinfokit.analys.format.hmmtocsv(file)HMM fileConvert HMM text output (from HMMER tool) to CSV format
bioinfokit.analys.format.tabtocsv(file)TAB fileConvert TAB file to CSV format
bioinfokit.analys.format.csvtotab(file)CSV fileConvert CSV file to TAB format

Returns:

Output will be saved in same directory

Working example

GFF3 to GTF file format conversion

latest update v1.0.1

bioinfokit.analys.gff.gff_to_gtf(file, trn_feature_name)

ParametersDescription
fileGFF3 genome annotation file
trn_feature_nameName of the feature (column 3 of GFF3 file) of RNA transcripts if other than 'mRNA' or 'transcript'

Returns:

GTF format genome annotation file (file.gtf will be saved in same directory)

Working Example

Bioinformatics file readers and processing (FASTA, FASTQ, and VCF)

latest update v2.0.4

FunctionParametersDescription
bioinfokit.analys.Fasta.fasta_reader(file)FASTA fileFASTA file reader
bioinfokit.analys.fastq.fastq_reader(file)FASTQ fileFASTQ file reader
bioinfokit.analys.marker.vcfreader(file)VCF fileVCF file reader

Returns:

File generator object (can be iterated only once) that can be parsed for the record

Description and working example

Extract subsequence from FASTA files

latest update v2.0.4

bioinfokit.analys.Fasta.ext_subseq(file, id, st, end, strand)

Extract the subsequence of specified region from FASTA file. If the target subsequence region is on minus strand. the reverse complementary of subsequence will be printed.

ParametersDescription
fileFASTA file [file]
idThe ID of sequence from FASTA file to extract the subsequence [string]
stStart integer coordinate of subsequnece [int]
endEnd integer coordinate of subsequnece [int]
strandStrand of the subsequence ['plus' or 'minus'][default: 'plus']

Returns:

Subsequence to stdout

Extract sequences from FASTA file

latest update v2.1.3

bioinfokit.analys.Fasta.extract_seq(file, id)

Extract the sequences from FASTA file based on the list of sequence IDs provided from other file

ParametersDescription
fileFASTA file [file]
idList of sequence IDs separated by new line. This file can also contain the ID, start and end coordinates separated by TAB [file]

Returns:

Sequences extracted from FASTA file based on the given IDs provided in id file. Output FASTA file will be saved as output.fasta in current working directory.

Description and working example

Split FASTA file into multiple FASTA files

latest update v2.0.4

bioinfokit.analys.Fasta.split_fasta(file, n, bases_per_line)

Split one big FASTA file into multiple smaller FASTA files

ParametersDescription
fileFASTA file [file]
nNumber of FASTA files to split the big FASTA file [int][default: 2]
bases_per_lineNumber of bases per line for ouput FASTA files [int][default: 60]

Returns:

Number of smaller FASTA files with prefix output (output_0.fasta, output_1.fasta and so on)

Convert multi-line FASTA into single-line FASTA

latest update v2.1.2

bioinfokit.analys.Fasta.multi_to_single_line(file)

Convert multi-line FASTA (where sequences are on multi lines) into single-line FASTA (where sequences are in single line)

ParametersDescription
fileFASTA file [file]

Returns:

Single line FASTA (output.fasta). Output FASTA file will be saved as output.fasta in current working directory.

Description and working example

Merge counts files from featureCounts

latest update v2.0.5

bioinfokit.analys.HtsAna.merge_featureCount(pattern, gene_column_name)

Merge counts files generated from featureCounts when it runs individually on large samples. The count files must be in same folder and should end with .txt file extension.

ParametersDescription
patternfile name pattern for each count file [default: '*.txt']
gene_column_namegene id column name for feature and meta-features [default: 'Geneid']

Returns:

Merge count file (gene_matrix_count.csv) in same folder

Split BED file by chromosome

latest update v2.0.9

bioinfokit.analys.HtsAna.split_bed(bed)

Split the BED file by chromosome names

ParametersDescription
bedInput BED file [default: None]

Returns:

BED file for each chromosome (files will be saved in same directory)

Working example

Max and Min sequence lengths from Fasta

latest update v2.1.4

bioinfokit.analys.Fasta.max_min_len(fasta)

Find Max and Min sequence lengths from Fasta

ParametersDescription
fastaInput Fasta file [default: None]

Returns:

Max and Min sequence lengths from Fasta file

Working example

Functional enrichment analysis

Gene family enrichment analysis (GenFam)

latest update v1.0.0

bioinfokit.analys.genfam.fam_enrich(id_file, species, id_type, stat_sign_test, multi_test_corr, min_map_ids, alpha)

GenFam is a comprehensive classification and enrichment analysis tool for plant genomes. It provides a unique way to characterize the large-scale gene datasets such as those from transcriptome analysis (read GenFam paper for more details)

ParametersDescription
id_fileText file containing the list of gene IDs to analyze using GenFam. IDs must be separated by newline.
speciesPlant species ID for GenFam analysis. All plant species ID provided here
id_typePlant species ID type
1: Phytozome locus ID
2: Phytozome transcript ID
3: Phytozome PAC ID
stat_sign_testStatistical significance test for enrichment analysis [default=1].
1: Fisher exact test
2: Hypergeometric distribution
3: Binomial distribution
4: Chi-squared distribution
multi_test_corrMultiple testing correction test [default=3].
1: Bonferroni
2: Bonferroni-Holm
3: Benjamini-Hochberg
min_map_idsMinimum number of gene IDs from the user list (id_file) must be mapped to the background database for performing GenFam analysis [default=5]
alphaSignificance level [float][default: 0.05]

Returns:

AttributeDescription
df_enrichEnriched gene families with p < 0.05
genfam_infoGenFam run information
Output filesOutput figures and files from GenFam analysis
genfam_enrich.png: GenFam figure for enriched gene families
fam_enrich_out.txt: List of enriched gene families with mapped gene IDs, GO annotation, and detailed statistics
fam_all_out.txt: List of all gene families with mapped gene IDs, GO annotation, and detailed statistics

Description and working example

Check allowed ID types for plant species for GenFam

latest update v1.0.0

bioinfokit.analys.genfam.check_allowed_ids(species)

ParametersDescription
speciesPlant species ID to check for allowed ID type. All plant species ID provided here

Returns:

Allowed ID types for GenFam

Description and working example

Biostatistical analysis

Correlation matrix plot

latest update v2.0.1

bioinfokit.visuz.stat.corr_mat(table, corm, cmap, r, dim, show, figtype, axtickfontsize, axtickfontname, theme)

ParametersDescription
tableDataframe object with numerical variables (columns) to find correlation. Ideally, you should have three or more variables. Dataframe should not have identifier column.
cormCorrelation method [pearson,kendall,spearman] [default:pearson]
cmapColor Palette for heatmap [string][default: 'seismic']. More colormaps are available at https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
rFigure resolution in dpi [int][default: 300]. Not compatible with show= True
dimFigure size [Tuple of two floats (width, height) in inches][default: (6, 5)]
showShow the figure on console instead of saving in current folder [True or False][default:False]
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
axtickfontsizeFont size for axis ticks [float][default: 7]
axtickfontnameFont name for axis ticks [string][default: 'Arial']
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

Correlation matrix plot image in same directory (corr_mat.png)

Working example

Bar-dot plot

latest update v0.8.5

bioinfokit.visuz.stat.bardot(df, colorbar, colordot, bw, dim, r, ar, hbsize, errorbar, dotsize, markerdot, valphabar, valphadot, show, figtype, axxlabel, axylabel, axlabelfontsize, axlabelfontname, ylm, axtickfontsize, axtickfontname, yerrlw, yerrcw)

ParametersDescription
dfPandas dataframe object
colorbarColor of bar graph [string or list][default:"#bbcfff"]
colordotColor of dots on bar [string or list][default:"#ee8972"]
bwWidth of bar [float][default: 0.4]
dimFigure size [Tuple of two floats (width, height) in inches][default: (6, 4)]
rFigure resolution in dpi [int][default: 300]
arRotation of X-axis labels [float][default: 0]
hbsizeHorizontal bar size for standard error bars [float][default: 4]
errorbarDraw standard error bars [bool (True or False)][default: True]
dotsizeThe size of the dots in the plot [float][default: 6]
markerdotShape of the dot marker. See more options at https://matplotlib.org/3.1.1/api/markers_api.html [string][default: "o"]
valphabarTransparency of bars on plot [float (between 0 and 1)][default: 1]
valphadotTransparency of dots on plot [float (between 0 and 1)][default: 1]
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
showShow the figure on console instead of saving in current folder [True or False][default:False]
axxlabelLabel for X-axis. If you provide this option, default label will be replaced [string][default: None]
axylabelLabel for Y-axis. If you provide this option, default label will be replaced [string][default: None]
axlabelfontsizeFont size for axis labels [float][default: 9]
axlabelfontnameFont name for axis labels [string][default: 'Arial']
ylmRange of ticks to plot on Y-axis [float Tuple (bottom, top, interval)][default: None]
axtickfontsizeFont size for axis ticks [float][default: 9]
axtickfontnameFont name for axis ticks [string][default: 'Arial']
yerrlwError bar line width [float][default: None]
yerrcwError bar cap width [float][default: None]

Returns:

Bar-dot plot image in same directory (bardot.png)

Working Example

One sample and two sample Z-tests

latest update v2.1.0

bioinfokit.analys.stat.ztest(df, x, y, mu, x_std, y_std, alpha, test_type)

ParametersDescription
dfPandas dataframe for appropriate Z-test.
One sample: It should have atleast one variable
Two sample independent: It should have atleast two variables
xcolumn name for x group [string][default: None]
ycolumn name for x group [string][default: None]
muPopulation or known mean for the one sample Z-test [float][default: None]
x_stdPopulation standard deviation for x group [float][default: None]
y_stdPopulation standard deviation for y group [float][default: None]
alphaSignificance level for confidence interval (CI). If alpha=0.05, then 95% CI will be calculated [float][default: 0.05]
test_typeType of Z-test [int (1,2)][default: None].
1: One sample Z-test
2: Two sample Z-test

Returns:

Summary output as class attribute (summary and result)

Description and Working example

One sample and two sample (independent and paired) t-tests

latest update v2.1.0

bioinfokit.analys.stat.ttest(df, xfac, res, evar, alpha, test_type, mu)

ParametersDescription
dfPandas dataframe for appropriate t-test.
One sample: It should have atleast dependent (res) variable
Two sample independent: It should have independent (xfac) and dependent (res) variables
Two sample paired: It should have two dependent (res) variables
xfacIndependent group column name with two levels [string][default: None]
resDependent variable column name [string or list or Tuple][default: None]
evart-test with equal variance [bool (True or False)][default: True]
alphaSignificance level for confidence interval (CI). If alpha=0.05, then 95% CI will be calculated [float][default: 0.05]
test_typeType of t-test [int (1,2,3)][default: None].
1: One sample t-test
2: Two sample independent t-test
3: Two sample paired t-test
muPopulation or known mean for the one sample t-test [float][default: None]

Returns:

Summary output as class attribute (summary and result)

Description and Working example

Chi-square test

latest update v0.9.5

bioinfokit.analys.stat.chisq(df, p)

ParametersDescription
dfPandas dataframe. It should be one or two-dimensional contingency table.
pTheoretical expected probabilities for each group. It must be non-negative and sum to 1. If p is provide Goodness of Fit test will be performed [list or Tuple][default: None]

Returns:

Summary and expected counts as class attributes (summary and expected_df)

Working example

Linear regression analysis

bioinfokit.visuz.stat.lin_reg(df, x, y)

ParametersDescription
dfPandas dataframe object
xName of column having independent X variables [list][default:None]
yName of column having dependent Y variables [list][default:None]

Returns:

Regression analysis summary

Working Example

Regression plot

latest update v2.0.1

bioinfokit.visuz.stat.regplot(df, x, y, yhat, dim, colordot, colorline, r, ar, dotsize, markerdot, linewidth, valphaline, valphadot, show, figtype, axxlabel, axylabel, axlabelfontsize, axlabelfontname, xlm, ylm, axtickfontsize, axtickfontname, theme)

ParametersDescription
dfPandas dataframe object
xName of column having independent X variables [string][default:None]
yName of column having dependent Y variables [string][default:None]
yhatName of column having predicted response of Y variable (y_hat) from regression [string][default:None]
dimFigure size [Tuple of two floats (width, height) in inches][default: (6, 4)]
rFigure resolution in dpi [int][default: 300]
arRotation of X-axis labels [float][default: 0]
dotsizeThe size of the dots in the plot [float][default: 6]
markerdotShape of the dot marker. See more options at https://matplotlib.org/3.1.1/api/markers_api.html [string][default: "o"]
valphalineTransparency of regression line on plot [float (between 0 and 1)][default: 1]
valphadotTransparency of dots on plot [float (between 0 and 1)][default: 1]
linewidthWidth of regression line [float][default: 1]
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
showShow the figure on console instead of saving in current folder [True or False][default:False]
axxlabelLabel for X-axis. If you provide this option, default label will be replaced [string][default: None]
axylabelLabel for Y-axis. If you provide this option, default label will be replaced [string][default: None]
axlabelfontsizeFont size for axis labels [float][default: 9]
axlabelfontnameFont name for axis labels [string][default: 'Arial']
xlmRange of ticks to plot on X-axis [float Tuple (bottom, top, interval)][default: None]
ylmRange of ticks to plot on Y-axis [float Tuple (bottom, top, interval)][default: None]
axtickfontsizeFont size for axis ticks [float][default: 9]
axtickfontnameFont name for axis ticks [string][default: 'Arial']
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

Regression plot image in same directory (reg_plot.png)

Working Example

Tukey HSD test

latest update v1.0.3

bioinfokit.analys.stat.tukey_hsd(df, res_var, xfac_var, anova_model, phalpha, ss_typ)

It performs multiple pairwise comparisons of treatment groups using Tukey's HSD (Honestly Significant Difference) test to check if group means are significantly different from each other. It uses the Tukey-Kramer approach if the sample sizes are unequal among the groups.

ParametersDescription
dfPandas dataframe with the variables mentioned in the res_var, xfac_var and anova_model options. It should not have missing data. The missing data will be omitted.
res_varName of a column having response variable [string][default: None]
xfac_varName of a column having factor or group for pairwise comparison [string][default: None]
anova_modelANOVA model (calculated using statsmodels ols function) [string][default: None]
phalphaSignificance level [float][default: 0.05]
ss_typType of sum of square to perform ANOVA [int][default: 2]

Returns:

AttributeDescription
tukey_summaryPairwise comparisons for main and interaction effects by Tukey HSD test

Description and Working example

Bartlett's test

latest update v1.0.3

bioinfokit.analys.stat.bartlett(df, xfac_var, res_var)

It performs Bartlett's test to check the homogeneity of variances among the treatment groups. It accepts the input table in a stacked format. More details https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.bartlett.html

ParametersDescription
dfPandas dataframe containing response (res_var) and independent variables (xfac_var) in a stacked format. It should not have missing data. The missing data will be omitted.
res_varName of a column having response variable [string][default: None]
xfac_varName of a column having treatment groups (independent variables) [string or list][default: None]

Returns:

AttributeDescription
bartlett_summaryPandas dataframe containing Bartlett's test statistics, degree of freedom, and p value

Description and Working example

Levene's test

latest update v1.0.3

bioinfokit.analys.stat.levene(df, xfac_var, res_var)

It performs Levene's test to check the homogeneity of variances among the treatment groups. It accepts the input table in a stacked format. More details https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.levene.html

ParametersDescription
dfPandas dataframe containing response (res_var) and independent variables (xfac_var) in a stacked format. It should not have missing data. The missing data will be omitted.
res_varName of a column having response variable [string][default: None]
xfac_varName of a column having treatment groups (independent variables) [string or list][default: None]
centerChoice for the Levene's test [string (median, mean, trimmed)] [default: median]
median: Brown-Forsythe Levene-type test
mean: original Levene's test
trimmed: Brown-Forsythe Levene-type test

Returns:

AttributeDescription
levene_summaryPandas dataframe containing Levene's test statistics, degree of freedom, and p value

Description and Working example

ROC plot

latest update v2.0.1

bioinfokit.visuz.stat.roc(fpr, tpr, c_line_style, c_line_color, c_line_width, diag_line, diag_line_style, diag_line_width, diag_line_color, auc, shade_auc, shade_auc_color, axxlabel, axylabel, axtickfontsize, axtickfontname, axlabelfontsize, axlabelfontname, plotlegend, legendpos, legendanchor, legendcols, legendfontsize, legendlabelframe, legend_columnspacing, dim, show, figtype, figname, r, ylm, theme)

Receiver operating characteristic (ROC) curve for visualizing classification performance

ParametersDescription
fprIncreasing false positive rates obtained from sklearn.metrics.roc_curve [list][default:None]
tprIncreasing true positive rates obtained from sklearn.metrics.roc_curve [list][default:None]
c_line_styleLine style for ROC curve [string][default:'-']
c_line_colorLine color for ROC curve [string][default:'#f05f21']
c_line_widthLine width for ROC curve [float][default:1]
diag_linePlot reference line [True or False][default: True]
diag_line_styleLine style for reference line [string][default:'--']
diag_line_widthLine width for reference line [float][default:1]
diag_line_colorLine color for reference line [string][default:'b']
aucArea under ROC. It can be obtained from sklearn.metrics.roc_auc_score [float][default: None]
shade_aucShade are for AUC [True or False][default: False]
shade_auc_colorShade color for AUC [string][default: '#f48d60']
axxlabelLabel for X-axis [string][default: 'False Positive Rate (1 - Specificity)']
axylabelLabel for Y-axis [string][default: 'True Positive Rate (Sensitivity)']
axtickfontsizeFont size for axis ticks [float][default: 9]
axtickfontnameFont name for axis ticks [string][default: 'Arial']
axlabelfontsizeFont size for axis labels [float][default: 9]
axlabelfontnameFont name for axis labels [string][default: 'Arial']
plotlegendplot legend [True or False][default:True]
legendposposition of the legend on plot. For more options see loc parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [string ][default:'lower right']
legendanchorposition of the legend outside of the plot. For more options see bbox_to_anchor parameter at https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.legend.html [list][default:None]
legendcolsNumber of columns for legends [int][default: 1]
legendfontsizeFont size for the legends [float][default:8]
legendlabelframeBox frame for the legend [True or False][default: False]
legend_columnspacingSpacing between the legends [float][default: None]
dimFigure size [Tuple of two floats (width, height) in inches][default: (5, 4)]
showShow the figure on console instead of saving in current folder [True or False][default:False]
figtypeFormat of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
fignamename of figure [string ][default:'roc']
rFigure resolution in dpi [int][default: 300]. Not compatible with show= True
ylmRange of ticks to plot on Y-axis [float (bottom, top, interval)][default: None]
themeChange background theme. If theme set to dark, the dark background will be produced instead of white [string][default:'None']

Returns:

ROC plot image in same directory (roc.png) Working example

Regression metrics

Calculate Root Mean Square Error (RMSE), Mean squared error (MSE), Mean absolute error (MAE), and Mean absolute percent error (MAPE) from regression fit

latest update v1.0.8

bioinfokit.analys.stat.reg_metric(y, yhat, resid)

ParametersDescription
yOriginal values for dependent variable [numpy array] [default: None]
yhatPredicted values from regression [numpy array] [default: None]
residRegression residuals [numpy array][default: None]

Returns:

Pandas dataframe with values for RMSE, MSE, MAE, and MAPE

Working example

Venn Diagram

bioinfokit.visuz.venn(vennset, venncolor, vennalpha, vennlabel)

ParametersDescription
vennsetVenn dataset for 3 and 2-way venn. Data should be in the format of (100,010,110,001,101,011,111) for 3-way venn and 2-way venn (10, 01, 11) [default: (1,1,1,1,1,1,1)]
venncolorColor Palette for Venn [color code][default: ('#00909e', '#f67280', '#ff971d')]
vennalphaTransparency of Venn [float (0 to 1)][default: 0.5]
vennlabelLabels to Venn [string][default: ('A', 'B', 'C')]

Returns:

Venn plot (venn3.png, venn2.png)

Working example

References:

  • Travis E. Oliphant. A guide to NumPy, USA: Trelgol Publishing, (2006).
  • John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55 (publisher link)
  • Fernando Pérez and Brian E. Granger. IPython: A System for Interactive Scientific Computing, Computing in Science & Engineering, 9, 21-29 (2007), DOI:10.1109/MCSE.2007.53 (publisher link)
  • Michael Waskom, Olga Botvinnik, Joel Ostblom, Saulius Lukauskas, Paul Hobson, MaozGelbart, … Constantine Evans. (2020, January 24). mwaskom/seaborn: v0.10.0 (January 2020) (Version v0.10.0). Zenodo. http://doi.org/10.5281/zenodo.3629446
  • Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825-2830 (2011)
  • Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56 (2010)
  • Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, Ä°lhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17(3), 261-272.
  • David C. Howell. Multiple Comparisons With Unequal Sample Sizes. https://www.uvm.edu/~statdhtx/StatPages/MultipleComparisons/unequal_ns_and_mult_comp.html

Last updated: November 20, 2021

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc