Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

biocartograph

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

biocartograph

Package was renamed from Biocarta v0.2.27 to Biocartograph because of an unintentional name clash

  • 0.10.20
  • PyPI
  • Socket score

Maintainers
1

Biocartograph

Creating Cartographic Representations of Biological Data DOI

Installation

pip install biocartograph

You can also build a nix environment for code execution if you have installed the nix package manager. You can enter it via a terminal by issuing:

nix-shell versioned_R_and_Python.nix

Example code

We generally work with short, or compact, format data frames. One describing the analytes (often abbreviated "adf") :

NAMENGT_mm12_10591...DM2_mm81_10199
215538_at16.826041...31.764484
...
LDLR19.261185...30.004612

and one journal describing the sample metadata (often abbreviated "jdf") :

NGT_mm12_10591...DM2_mm81_10199
DiseaseProstate Cancer...Gastric Cancer
Cell-line143B...22Rv1
TissueProstate...Gastric Urinary tract

if these are stored as tab-delimited text files then it is straightforward to read them in from disc.

if __name__ == '__main__' :
    from biocartograph.quantification import full_mapping
    #
    adf = pd.read_csv('analytes.tsv',sep='\t',index_col=0)
    #
    # WE DO NOT WANT TO KEEP POTENTIALLY BAD ENTRIES 
    adf = adf.iloc[ np.inf != np.abs( 1.0/np.std(adf.values,1) ) ,
                    np.inf != np.abs( 1.0/np.std(adf.values,0) ) ].copy()
    #
    # READING IN SAMPLE INFORMATION
    # THIS IS NEEDED FOR THE ALIGNED PCA TO WORK
    jdf = pd.read_csv('journal.tsv',sep='\t',index_col=0)
    jdf = jdf.loc[:,adf.columns.values]

Next, we specify how to conduct the calculation

    consensus_labels = ['Tissue']
    results = full_mapping ( adf , jdf                                  ,
            bVerbose                    = True                          ,
            alignment_label             = alignment_label               ,
            umap_n_neighbors            = 20                            ,
            umap_local_connectivity     = 20.                           ,
            bUseUmap                    = False                         ,
            consensus_labels            = consensus_labels              ,
            distance_type               = 'coexpression'                ,
            hierarchy_cmd               = 'ward' ,
            directory                   = '../results' ,
            n_clusters                  = sorted([ 10 , 20 , 30 , 40 , 60 , 70 , 90 , 80 , 100 ,
                                                120 , 140 , 160 , 180 , 200 ,
                                                250 , 300 , 350 , 400 , 450 , 500 ,
                                                600 , 700 , 800 , 900 , 1000 ])  )
    #
    map_analytes        = results[0]
    map_samples         = results[1]
    hierarchy_analytes  = results[2]
    hierarchy_samples   = results[3]
    header_str = results[0].index.name

In this example, we didn't calculate any projection properties relating to the Cell-line label. We also decided on outputting some specific cuts through the hierarchical clustering solution corresponding to different amounts of clusters. We generate multivariate projected PCA files for all the consensus and alignment labels. Plotting the information on the map analytes PCA projections yields: Cancer Disease mPCA Example

You can also run an alternative algorithm where the UMAP coordinates are employed directly for clustering by setting bUseUmap=True with the following results, or download the gist zip and open the html index:

chromium index.html

Other generated solutions

The clustering visualisations were created using the Biocartograph and hvplot :

What groupings correspond to biomarker variance that describes them? Here are some visualisations of that:

Cell-line Diseases Tissues Single cells Brain tissues Blood immune cells

We can also make more elaborate visualisation applications with the information that the biocartograph calculates.

Mammals

interactive rat interactive pig

Independent component analysis

Depending on the data distribution it can be a good idea to optimize the fourth statistical moment instead of the second when projecting feature annotations. This can easily be done using the biocartograph package setting the 'bUseFastICA=True' or setting 'distance_type=kurtosis' when performing the mapping:

    results = full_mapping ( adf , jdf                                  ,
            bVerbose                    = True                          ,
            alignment_label             = alignment_label               ,
            umap_n_neighbors            = 20                            ,
            umap_local_connectivity     = 20.                           ,
            bUseUmap                    = False                         ,
            bUseFastICA                 = True                          ,
            consensus_labels            = consensus_labels              ,
            distance_type               = 'kurtosis'                    ,
            hierarchy_cmd               = 'ward' ,
            directory                   = '../results' ,
            n_clusters                  = sorted([ 10 , 20 , 30 , 40 , 60 , 70 , 90 , 80 , 100 ,
                                                120 , 140 , 160 , 180 , 200 ,
                                                250 , 300 , 350 , 400 , 450 , 500 ,
                                                600 , 700 , 800 , 900 , 1000 ])  )

The FastICA is then used instead of covariance coordinates. In the above snippet setting either of the two options will be enough to do the ICA decomposition, with the results : Cell-line Disease Tissue Singlecell Brain tissues Blood immune cells

Enrichment results

If we have gmt files describing what groups of our analytes might be in then we can calculate enrichment properties for gene groupings (clusters). One resource for obtaining information is the Reactome database. If the pathway definitions are hierarchical then you can also supply the parent-child list and calculate treemap enrichments for all your clusters. Example of biocartograph treemap cluster

The code for doing it might look something like this :

    from biocartograph.special import generate_atlas_files
    import biocartograph.enrichment as bEnriched

    df_ = pd.read_csv( header_str + 'resdf_f.tsv' , index_col=0 , sep='\t' )
    df_ .loc[:,'cids.max' ]     = [ str(v) for v in df_.loc[:,'cids.max' ].values       ] # OPTIMAL SOLUTION
    enr_dict = bEnriched.calculate_for_cluster_groups ( df_ , label = 'cids.max' ,
                    gmtfile = '../data/Reactome/reactome_v71.gmt' , pcfile = '../data/Reactome/NewestReactomeNodeRelations.txt' ,
                    group_identifier = 'R-HSA' , significance_level = 0.1 )
    for item in enr_dict.items() :
        item[1].to_csv( header_str + 'treemap_c' + str(item[0])+'.tsv',sep='\t' )

You can also produce a gmt and pcfile of your own from the clustering solution labels:

    from biocartograph.special import generate_atlas_files , reformat_results_and_print_gmtfile_pcfile
    cl_gmtname , cl_pcname = reformat_results_and_print_gmtfile_pcfile ( header_str , hierarchy_id = 'cids.max', hierarchy_level_label = 'HCLN' )

For group factor enrichments simply use the bEnriched.from_multivariate_group_factors method instead. This will produce results that can be visualised like this: biocartograph gfa Reactome enrichment or the cluster label gfa enrichments

Example : Visualise hierarchical dependance and significances

Here we will study the hierarchical dependance of enrichment group results using a jigsaw like approach. The piecewise fitting into the final jigsaw convey how similar the enrichment groups are for the data. The relative sizes of the pieces relate to the significance level of each group. It is assumed that an enrichment calculation has already been performed using the biocartograph functions. This approach makes use of a NodeGraph class as well as hilbert curve construction, both from the impetuous-gfa package. Now we show some example code for how to create the below graph

    from biocartograph.special import create_NodeGraph_object_from_treemap_file
    nG_ = create_NodeGraph_object_from_treemap_file( '../bioc_results/DMHMSY_Fri_Feb__2_13_16_01_2024_treemap_c4.tsv' )
    #
    if False :
        print ( "THE JSON DATA" )
        print ( nG_.write_json() )
        print ( "THE LEAF NODES" )
        print ( nG_.retrieve_leaves( nG_.get_root_id() ) )

    from biocartograph.visualisation import create_hilbertmap
    dR = create_hilbertmap ( nG_			 ,
                quant_label = 'Significance' , #'Significance', # quant_label = 'Area'
                search_type = 'breadth'      , # search_type = 'depth'
                n = 32				)
    P  = dR[ 'P data' ]
    NN = dR[ 'NearestN' ]
    #
    from biocartograph.visualisation import show_hilbertmap_polygons
    show_hilbertmap_polygons( dR , bAddLabels=True )
    #
    from biocartograph.visualisation import return_hilbertmap_polygons
    dP = return_hilbertmap_polygons( dR )
    show_hilbertmap_polygons( dP , bInputDataIsPolygoned=True , bAddLabels=True )
    show_hilbertmap_simple ( dR )
    #

with the result: teaser The colormap used for the treemap is the spectral stepping map as defined in the biocartograph.special module. The text color inversion is also defined in the biocartograph.visualisation module. Both can be imported and used via

from biocartograph.special import create_color
from biocartograph.visualisation import invert_color

Take note that some special functions are also imported into other biocartograph modules and can be called from either. A more traditional graphviz dependent treemap can also be created using biocartograph functions:

    from biocartograph.visualisation import DrawGraphText
    dgt = DrawGraphText(        color_label = 'Color' , area_label = 'Area',
                        celltext_label = 'Description' , font = 'Arial' )
    #
    dgt .create_gv_node_info( nG_.get_root_id() , nG_  )
    graphtext = dgt.return_story()
    #
    import pygraphviz as pgv
    G1 = pgv.AGraph( graphtext )
    G1 .layout()
    G1 .draw("file1.svg")

with the result: teaser

Creating a nested file structure

There is a function within the biocartograph package that can be used to package your generated results into a more easily parsed directory. This function can be called via :

generate_atlas_files ( header_str )

This will produce cluster annotation information taken from the enrichment files as well as the sample labels used.

Cell-line Disases Tissues Single cells Brain tissues Blood immune cells

Upcoming

Hopefully, an even more helpful wiki will be provided in the future.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc