
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Anti-correlation based feature selection for single cell (and other) omics datasets.
Requires Python 3.6 or higher.
Install from PyPI:
pip install anticor_features
Or install from source:
git clone https://bitbucket.org/scottyler892/anticor_features.git
cd anticor_features
pip install .
from anticor_features.anticor_features import get_anti_cor_genes
# exprs: array-like or HDF5 dataset with genes in rows and cells in columns
# feature_ids: list of gene IDs matching rows of exprs
# species: g:Profiler species code (e.g., "hsapiens" or "mmusculus")
anti_cor_table = get_anti_cor_genes(exprs, feature_ids, species="hsapiens")
# Filter selected genes
selected = anti_cor_table.loc[anti_cor_table["selected"], "gene"].tolist()
print(selected)
See the g:Profiler organism list for valid species codes: https://biit.cs.ut.ee/gprofiler/page/organism-list
pre_remove_features
: list of gene IDs to exclude before analysis.pre_remove_pathways
: list of GO term codes whose genes will be removed.min_express_n
: minimum number of cells a gene must be expressed in to be considered (set to -1 to disable filtering, e.g., for non-expression or non-single-cell data).scratch_dir
: directory for temporary HDF5 files (default: system temp directory).bin_size
: number of features per batch when computing correlation matrix.FPR
and FDR
: false positive rate and false discovery rate for negative correlations.num_pos_cor
: minimum number of positive correlations to select a feature.For datasets that are not single-cell or gene-expression matrices (e.g., bulk omics, proteomics, metabolomics, or other feature embeddings), you can skip the minimum-expression filter and run only the anti-correlation statistics by setting min_express_n=-1
. For example:
anti_cor_df = get_anti_cor_genes(
embed_df,
feature_ids=embed_df.index.tolist(),
pre_remove_features=[],
pre_remove_pathways=[],
min_express_n=-1
)
Setting min_express_n=-1
disables the minimum-expression requirement (only meaningful for count-based single-cell data), allowing all features to be included in the statistical analysis.
When using Scanpy (AnnData
), transpose the data matrix:
from anticor_features.anticor_features import get_anti_cor_genes
anti_cor_table = get_anti_cor_genes(
adata.X.T,
adata.var.index.tolist(),
species="hsapiens"
)
import pandas as pd
adata.var = pd.concat([adata.var, anti_cor_table], axis=1)
selected = anti_cor_table.loc[anti_cor_table["selected"], "gene"].tolist()
adata.raw = adata
adata = adata[:, selected]
python3 -m anticor_features.anticor_features \
-i exprs.tsv \
-species mmusculus \
-out_file anti_cor_features.tsv \
-scratch_dir /path/to/tmp \
-use_default_pathway_removal
Options:
-i
, --infile
: input expression matrix (TSV or HDF5).-species
: g:Profiler species code (default: "hsapiens").-out_file
: output file path for the results table.-hdf5
: treat input as HDF5 with dataset key "infile".-ids
: file with feature (gene) IDs (no header) for HDF5 input.-cols
: file with sample (cell) IDs (with header) for HDF5 input.-scratch_dir
: directory for temporary files.-use_default_pathway_removal
: remove default mitochondrial, ribosomal, and related pathways.-h, --help
: display full help message.Computing time scales with number of features and batch size. Selecting anti-correlated features on ~10k genes and ~3k cells typically takes 1–2 minutes (network time for g:Profiler). Larger datasets may take longer.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
Scott Tyler scottyler89+bitbucket@gmail.com
FAQs
Anti-correlation based feature selection for single cell datasets
We found that anticor-features demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.