Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
direpack
: a Python 3 library for state-of-the-art statistical dimension reduction techniquesThis package delivers a scikit-learn
compatible Python 3 package for sundry state-of-the art multivariate statistical methods, with
a focus on dimension reduction.
The categories of methods delivered in this package, are:
ppdire
)sudire
)sprm
)
each of which are presented as scikit-learn
compatible objects in the corresponding folders.We hope that this package leads to scientific success. If it does so, we kindly ask to cite the official direpack
publication [0], as well as the original publication of the corresponding method.
The package also contains a set of tools for pre- and postprocessing:
preprocessing
folder provides classical and robust centring and scaling, as well as spatial sign transforms [4] and the robbustness inducing wrapping transformation [15].dicomo
folder contains a versatile class to access a wide variety of moment and co-moment statistics, and statistics derived from those. Check out the dicomo Documentation file and the dicomo Examples Notebook.plot
foldercross-validation
foldersprm
foldersprm.py
) [1]snipls.py
)rm.py
)_m_support_functions.py
)ppdire
folderThe ppdire
class will give access to a wide range of projection pursuit dimension reduction techniques.
These include slower approximate estimates for well-established methods such as PCA, PLS and continuum regression.
However, the class provides unique access to a set of robust options, such as robust continuum regression (RCR) [5], through its native grid
optimization algorithm, first
published for RCR as well [6]. Moreover, ppdire
is also a great gateway to calculate generalized betas, using the CAPI projection index [7].
The code is orghanized in
ppdire.py
- the main PP dimension reduction classcapi.py
- the co-moment analysis projection index.sudire
folderThe sudire
folder gives access to an extensive set of methods that resort under the umbrella of sufficient dimension reduction.
These range from meanwhile long-standing, well-accepted approaches, such as sliced inverse regression (SIR) and the closely related SAVE [8,9],
through methods such as directional regression [10] and principal Hessian directions [11], and more. However, the package also contains some
of the most recently developed, state-of-the-art sufficient dimension reduction techniques, that require no distributional assumptions.
The options provided in this category are based on energy statistics (distance covariance [12] or martingale difference divergence [13]) and
ball statistics (ball covariance) [14]. All of these options can be called by setting the corresponding parameters in the sudire
class, cf. the docs.
Note: the ball covariance option will require some lines to be uncommented as indicated. We decided not to make that option generally available,
since it depends on the Ball
package that seems to be difficult to install on certain architectures.
The package is distributed through PyPI, so install through:
pip install direpack
Note that some of the key methods in the sudire
subpackage rely on the IPOPT
optimization package, which according to their recommendation, can best be installed
directly as:
conda install -c conda-forge cyipopt
Detailed documentation can be found in the ReadTheDocs page.
A more extensive description on the background is presented in the official direpack
publication.
Examples on how to use each of the dicomo
, ppdire
, sprm
and sudire
classes are presented as Jupyter notebooks in the examples folder
Furthemore, the docs folder contains a few markdown files on usage of the classes.
direpack
: A Python 3 package for state-of-the-art statistical dimensionality reduction methods, Emmanuel Jordy Menvouta, Sven Serneels, Tim Verdonck, SoftwareX, 21 (2023), 101282.
Sparse partial robust M regression, Irene Hoffmann, Sven Serneels, Peter Filzmoser, Christophe Croux, Chemometrics and Intelligent Laboratory Systems, 149 (2015), 50-59.
Partial robust M regression, Sven Serneels, Christophe Croux, Peter Filzmoser, Pierre J. Van Espen, Chemometrics and Intelligent Laboratory Systems, 79 (2005), 55-64.
Sparse and robust PLS for binary classification, I. Hoffmann, P. Filzmoser, S. Serneels, K. Varmuza, Journal of Chemometrics, 30 (2016), 153-162.
Spatial Sign Preprocessing:  A Simple Way To Impart Moderate Robustness to Multivariate Estimators, Sven Serneels, Evert De Nolf, Pierre J. Van Espen, Journal of Chemical Information and Modeling, 46 (2006), 1402-1409.
Robust Continuum Regression, Sven Serneels, Peter Filzmoser, Christophe Croux, Pierre J. Van Espen, Chemometrics and Intelligent Laboratory Systems, 76 (2005), 197-204.
Robust Multivariate Methods: The Projection Pursuit Approach, Peter Filzmoser, Sven Serneels, Christophe Croux and Pierre J. Van Espen, in: From Data and Information Analysis to Knowledge Engineering, Spiliopoulou, M., Kruse, R., Borgelt, C., Nuernberger, A. and Gaul, W., eds., Springer Verlag, Berlin, Germany, 2006, pages 270--277.
Projection pursuit based generalized betas accounting for higher order co-moment effects in financial market analysis, Sven Serneels, in: JSM Proceedings, Business and Economic Statistics Section. Alexandria, VA: American Statistical Association, 2019, 3009-3035.
Sliced Inverse Regression for Dimension Reduction Li K-C, Journal of the American Statistical Association (1991), 86, 316-327.
Sliced Inverse Regression for Dimension Reduction: Comment, R.D. Cook, and Sanford Weisberg, Journal of the American Statistical Association (1991), 86, 328-332.
On directional regression for dimension reduction , B. Li and S.Wang, Journal of the American Statistical Association (2007), 102:997–1008.
On principal hessian directions for data visualization and dimension reduction:Another application of stein’s lemma, K.-C. Li. , Journal of the American Statistical Association(1992)., 87,1025–1039.
Sufficient Dimension Reduction via Distance Covariance, Wenhui Sheng and Xiangrong Yin in: Journal of Computational and Graphical Statistics (2016), 25, issue 1, pages 91-104.
A martingale-difference-divergence-based estimation of central mean subspace, Yu Zhang, Jicai Liu, Yuesong Wu and Xiangzhong Fang, in: Statistics and Its Interface (2019), 12, number 3, pages 489-501.
Robust Sufficient Dimension Reduction Via Ball Covariance Jia Zhang and Xin Chen, Computational Statistics and Data Analysis 140 (2019) 144–154
Fast Robust Correlation for High-Dimensional Data Jakob Raymaekers and Peter J. Rousseeuw, Technometrics, 63 (2021), 184-198.
Release Notes can be checked out in the repository.
A list of possible topics for further development is provided as well. Additions and comments are welcome!
FAQs
A Python 3 Library for State-of-the-Art Statistical Dimension Reduction Methods
We found that direpack demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.