Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Documentation: enstat.readthedocs.io
enstat is a library to compute ensemble statistics without storing the entire ensemble in memory. In particular, it allows you to compute:
Below you find a quick-start. For more information, see the documentation.
The key feature is to store the sum of the first and second statistical moments and the number of samples. This gives access to the mean (and variance) at all times, while you can keep adding samples.
Suppose that we have 100 realisations, each with 1000 'items', and we want to know the ensemble average of each item:
import enstat
import numpy as np
ensemble = enstat.static()
for realisation in range(100):
sample = np.random.random(1000)
ensemble += sample
print(ensemble.mean())
which will print a list of 1000 values, each around 0.5
.
This is the equivalent of
import numpy as np
container = np.empty((100, 1000))
for realisation in range(100):
sample = np.random.random(1000)
container[realisation, :] = sample
print(np.mean(container, axis=0))
The key difference is that enstat only requires you to have 4 * N
values in memory for a sample of size N
: the sample itself, the sums of the first and second moment, and the normalisation.
Instead the solution with the container uses much more memory.
A nice feature is also that you can keep adding samples to ensemble
.
You can even store it and continue later.
Same example, but now we want the histogram for predefined bins:
import enstat
import numpy as np
bin_edges = np.linspace(0, 1, 11)
hist = enstat.histogram(bin_edges=bin_edges)
for realisation in range(100):
sample = np.random.random(1000)
hist += sample
print(hist.p)
which prints the probability density of each bin (so list of values around 0.1
for these bins).
The histogram
class contains two additional nice features.
It has several bin algorithms that NumPy does not have.
It can be used for plotting with an ultra-sort interface, for example:
import enstat
import numpy as np
import matplotlib.pyplot as plt
data = np.random.random(1000)
hist = enstat.histogram.from_data(data, bins=10, mode="log")
fig, ax = plt.subplots()
ax.plot(hist.x, hist.p)
plt.show()
You can even use ax.plot(*hist.plot)
.
Suppose you have some time series (t
) with multiple observables (a
and b
); e.g.;
import enstat
import numpy as np
t = np.linspace(0, 10, 100)
a = np.random.normal(loc=5, scale=0.1, size=t.size)
b = np.random.normal(loc=1, scale=0.5, size=t.size)
Now suppose that you want to compute the average a
, b
, and t
based on a certain binning of t
:
bin_edges = np.linspace(0, 12, 12)
binned = enstat.binned.from_data(t, a, b, names=["t", "a", "b"]m bin_edges=bin_edges)
print(binned["a"].mean())
Using conda
conda install -c conda-forge enstat
Using PyPi
python -m pip install enstat
This library is free to use under the MIT license. Any additions are very much appreciated. As always, the code comes with no guarantee. None of the developers can be held responsible for possible mistakes.
FAQs
Ensemble averages
We found that enstat demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.