Security News
Weekly Downloads Now Available in npm Package Search Results
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.
Use fast FFT-based mutual information screening for large datasets. Works well on MRI brain imaging data. Developed by Kai Yang, [GPG Public key Fingerprint: CC02CF153594774CF956691492B2600D18170329](https://keys.openpgp.org/vks/v1/by-fingerprint/CC02CF153594774CF956691492B2600D18170329)
This packages uses FFT-based mutual information screening and accelerated gradient method for important variables from (potentially very) high-dimensional large datasets.
Consider the sizes of the datafiles, the most commonly-used functions are the functions run in parallel -- all functions running in parallel will has _parallel
suffix; and they all have arguments:
core_num
: number of CPU cores used for multiprocessing; the default option is to use all the cores available, considering the job is most likely running on a server instead of a PCmultp
: job multiplier, the job to be run in parallel will be first divided into core_num * multp
sub-jobs -- as equal as possible, then at each time, one core will take one subjob.verbose
: how verbal the function will be, with 0
being least verbal and increases wrt. the number decalred hereThe function implementing our propsoed FFT-based mutual information estimation will have the following arguments:
N
: the grid size for 1-D FFT; with N=500
as the default valuea_N
, a_N
: similar to above, the grid size for 2-D FFT; with 300
as the default valueskernel
and bw
specify the kernel and bandwidth used for KDEnorm
is the norm used for KDE -- this option only takes effects for 2-D KDEThe screening functions and their arguments:
plink
files:bed_file
, bim_file
, fam_file
are the location of the plink files;outcome
, outcome_iid
are the outcome values and the iids for the outcome. For genetic data, it is usual that the order of SNP iid and the outcome iid don't match. While SNP iid can be obtained from the plink1 files, outcome iid here is to be declared separately. outcome_iid
should be a list of strings or a one-dimensional numpy string array.continuous_screening_plink
, continuous_screening_plink_parallel
for screening on continuous outcomes with continuous covariatesbinary_screening_plink
, binary_screening_plink_parallel
for screening on binary outcomes with continuous covariatesclump_plink_parallel
for clumping -- starting from the first covariate (i.e., the first column on the left of the datafile), clumping will remove all subsequent covariates with a mutual information higher than what the clumping_threshold
declares with the one it looks atcsv
files:_usecols
is a list of column labels to be used, the first element should be the outcome. Returned mutual information calculation results match _usecols
._usecols
to set the first element to be the outcome column label.csv_engine
can use dask
for low memory situations, or pandas
's read_csv
engine
s, or fastparquet
engine for a created parquet
file for faster speed. If fastparquet
is chosen, declare parquet_file
as the filepath to the parquet file; if dask
is chosen to read very large CSV, it might need to specify a larger sample
.binary_screening_csv
, binary_screening_csv_parallel
for screening on binary outcomes with continuous covariatesbinary_skMI_screening_csv_parallel
, continuous_skMI_screening_csv_parallel
for screening using mutual information estimation provided by skLearn
, i.e., sklearn.metrics.mutual_info_score
, sklearn.feature_selection.mutual_info_classif
Pearson_screening_csv_parallel
for screening using Pearson correlationcontinuous_screening_csv
, continuous_screening_csv_parallel
for screening on continuous outcomes with continuous covariatesclump_continuous_csv_parallel
similar to aboveA share_memory
option is added for multiprocess computing. As a feature, it can be applied on large .csv
data in parallel in a memory-efficient manner and use FFT for KDE to estimate the mutual information extremely fast. A tqdm progress bar is now added to be more useful on cloud computing platforms. verbose
option can take values of 0,1,2
, with 2
being most verbal; 1
being only show progress bar, and 0
being not verbal at all.
binary_screening_dataframe
, binary_screening_dataframe_parallel
for screening on binary outcomes with continuous covariatesbinary_skMI_screening_dataframe_parallel
, continuous_skMI_screening_dataframe_parallel
for screening using mutual information estimation provided by skLearn
, i.e., sklearn.metrics.mutual_info_score
, sklearn.feature_selection.mutual_info_classif
Pearson_screening_dataframe_parallel
for screening using Pearson correlationcontinuous_screening_dataframe
, continuous_screening_dataframe_parallel
for screening on continuous outcomes with continuous covariatesclump_continuous_dataframe_parallel
similar to abovenumpy
arrays:binary_screening_array
, binary_screening_array_parallel
for screening on binary outcomes with continuous covariatescontinuous_screening_array
, continuous_screening_array_parallel
for screening on continuous outcomes with continuous covariatesbinary_skMI_array_parallel
, continuous_skMI_array_parallel
for screening using mutual information estimation provided by skLearn
, i.e., sklearn.metrics.mutual_info_score
, sklearn.feature_selection.mutual_info_classif
continuous_Pearson_array_parallel
for screening using Pearson correlationFAQs
Use fast FFT-based mutual information screening for large datasets. Works well on MRI brain imaging data. Developed by Kai Yang, [GPG Public key Fingerprint: CC02CF153594774CF956691492B2600D18170329](https://keys.openpgp.org/vks/v1/by-fingerprint/CC02CF153594774CF956691492B2600D18170329)
We found that fastHDMI demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.
Security News
A Stanford study reveals 9.5% of engineers contribute almost nothing, costing tech $90B annually, with remote work fueling the rise of "ghost engineers."
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.