Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Simply install via conda-forge!
conda install -c conda-forge tabmat
The easiest way to start with tabmat is to use the convenience constructor tabmat.from_pandas
.
import tabmat as tm
import numpy as np
dense_array = np.random.normal(size=(100, 1))
TL;DR: We provide matrix classes for efficiently building statistical algorithms with data that is partially dense, partially sparse and partially categorical.
Data used in economics, actuarial science, and many other fields is often tabular, containing rows and columns. Further common properties are also common:
High-performance statistical applications often require fast computation of certain operations, such as
transpose(X) @ diag(d) @ X
. A sandwich product shows up in the solution to weighted least squares, as well as in the Hessian of the likelihood in generalized linear models such as Poisson regression.We designed this library with the above use cases in mind. We built this library first for estimating generalized linear models, but expect it will be useful in a variety of econometric and statistical use cases. This library was borne out of our need for speed, and its unified API is motivated by the desire to work with a unified matrix API internal to our statistical algorithms.
Design principles:
DenseMatrix
and SparseMatrix
subclass np.ndarray
and scipy.sparse.csc_matrix
respectively, and inherit behavior from those classes wherever it is not improved on.sum
) return NumPy arrays, following NumPy dimensions about the dimensions of results. The aim is to make these classes as close as possible to being drop-in replacements for numpy.ndarray
. This is not always possible, however, due to the differing APIs of numpy.ndarray
and scipy.sparse
.toarray
, mimic Scipy sparse syntax.getcol
.Individual subclasses may support significantly more operations.
DenseMatrix
represents dense matrices, subclassing numpy nparray. It additionally supports methods getcol
, toarray
, sandwich
, standardize
, and unstandardize
.SparseMatrix
represents column-major sparse data, subclassing scipy.sparse.csc_matrix
. It additionally supports methods sandwich
and standardize
.CategoricalMatrix
represents one-hot encoded categorical matrices. Because all the non-zeros in these matrices are ones and because each row has only one non-zero, the data can be represented and multiplied much more efficiently than a generic sparse matrix.SplitMatrix
represents matrices with both dense, sparse and categorical parts, allowing for a significant speedup in matrix multiplications.StandardizedMatrix
efficiently and sparsely represents a matrix that has had its column normalized to have mean zero and variance one. Even if the underlying matrix is sparse, such a normalized matrix will be dense. However, by storing the scaling and shifting factors separately, StandardizedMatrix
retains the original matrix sparsity.See here for detailed benchmarking.
FAQs
Efficient matrix representations for working with tabular data.
We found that tabmat demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.