Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
histogrammar is a Python package for creating histograms. histogrammar has multiple histogram types, supports numeric and categorical features, and works with Numpy arrays and Pandas and Spark dataframes. Once a histogram is filled, it's easy to plot it, store it in JSON format (and retrieve it), or convert it to Numpy arrays for further analysis.
At its core histogrammar is a suite of data aggregation primitives designed for use in parallel processing. In the simplest case, you can use this to compute histograms, but the generality of the primitives allows much more.
Several common histogram types can be plotted in Matplotlib, Bokeh and PyROOT with a single method call. If Numpy or Pandas is available, histograms and other aggregators can be filled from arrays ten to a hundred times more quickly via Numpy commands, rather than Python for loops. If PyROOT is available, histograms and other aggregators can be filled from ROOT TTrees hundreds of times more quickly by JIT-compiling a specialized C++ filler. Histograms and other aggregators may also be converted into CUDA code for inclusion in a GPU workflow. And if PyCUDA is available, they can also be filled from Numpy arrays by JIT-compiling the CUDA code.
This Python implementation of histogrammar been tested to guarantee compatibility with its Scala implementation.
Latest Python release: v1.0.33 (Dec 2022).
See Changes log here <https://github.com/histogrammar/histogrammar-python/blob/master/CHANGES.rst>
_.
With Spark 3.0, based on Scala 2.12, make sure to pick up the correct histogrammar jar file:
.. code-block:: python
spark = SparkSession.builder.config("spark.jars.packages", "io.github.histogrammar:histogrammar_2.12:1.0.20,io.github.histogrammar:histogrammar-sparksql_2.12:1.0.20").getOrCreate()
For Spark 2.X compiled against scala 2.11, in the string above simply replace "2.12" with "2.11".
February, 2021
.. list-table:: :widths: 80 20 :header-rows: 1
Basic tutorial <https://nbviewer.jupyter.org/github/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_basic.ipynb>
_Detailed example (featuring configuration, Apache Spark and more) <https://nbviewer.jupyter.org/github/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_advanced.ipynb>
_Exercises <https://nbviewer.jupyter.org/github/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_exercises.ipynb>
_See histogrammar-docs <https://histogrammar.github.io/histogrammar-docs/>
_ for a complete introduction to histogrammar
.
(A bit old but still good.) There you can also find documentation about the Scala implementation of histogrammar
.
The historgrammar
library requires Python 3.6+ and is pip friendly. To get started, simply do:
.. code-block:: bash
$ pip install histogrammar
or check out the code from our GitHub repository:
.. code-block:: bash
$ git clone https://github.com/histogrammar/histogrammar-python $ pip install -e histogrammar-python
where in this example the code is installed in edit mode (option -e).
You can now use the package in Python with:
.. code-block:: python
import histogrammar
Congratulations, you are now ready to use the histogrammar library!
As a quick example, you can do:
.. code-block:: python
import pandas as pd import histogrammar as hg from histogrammar import resources
df = pd.read_csv(resources.data('test.csv.gz'), parse_dates=['date']) df.head()
hist = hg.Histogram(num=100, low=0, high=100, quantity='age') hist.fill.numpy(df) hist.plot.matplotlib()
hists = df.hg_make_histograms() print(hists.keys())
hists = df.hg_make_histograms(features=['longitude:latitude']) ll = hists['longitude:latitude'] ll.plot.matplotlib()
ll.toJsonFile('longitude_latitude.json') ll2 = hg.Factory().fromJsonFile('longitude_latitude.json')
These examples also work with Spark dataframes (sdf):
.. code-block:: python
from pyspark.sql.functions import col hist = hg.Histogram(num=100, low=0, high=100, quantity=col('age')) hist.fill.sparksql(sdf)
For more examples please see the example notebooks and tutorials.
This package was originally authored by DIANA-HEP and is now maintained by volunteers.
Please note that histogrammar
is supported only on a best-effort basis.
histogrammar
is completely free, open-source and licensed under the Apache-2.0 license <https://en.wikipedia.org/wiki/Apache_License>
_.
.. |notebook_basic_colab| image:: https://colab.research.google.com/assets/colab-badge.svg :alt: Open in Colab :target: https://colab.research.google.com/github/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_basic.ipynb .. |notebook_advanced_colab| image:: https://colab.research.google.com/assets/colab-badge.svg :alt: Open in Colab :target: https://colab.research.google.com/github/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_advanced.ipynb .. |notebook_exercises_colab| image:: https://colab.research.google.com/assets/colab-badge.svg :alt: Open in Colab :target: https://colab.research.google.com/github/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_exercises.ipynb
FAQs
Composable histogram primitives for distributed data reduction.
We found that histogrammar demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.