Socket
Book a DemoInstallSign in
Socket

shades

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

shades

0.17
bundlerRubygems
Version published
Maintainers
1
Created
Source

Shades

Get a new perspective on your data. In-memory OLAP cubing, histograms, and more for Ruby.

Install

gem install shades

As a command line utility for OLAP cubing

The shades utility will accept whitespace-delimited data, one event per line, preceeded by two commented lines describing the dimensions and measures within.

# dimensions: timestamp transactionid customer item
# measures: quantity amount
1371958271 1 jack golfclubs  3 75.00
1371937693 1 jack gin        2 40.00
1371979661 2 jane jar        6  6.00

Each line will be parsed as a Shades::Event according to the metadata given in the first two lines. So the line

1371937693 1 jack gin        2 40.00

Will create a Shades::Event of the form:

dimensions: 
  timestamp      = 1371937693 
  transactionid  = 1
  customer       = jack
  item           = gin
measures:
  quantity       = 2
  amount         = 40.00

Then we can perform simple aggregations like so. This one finds the total amount each customer has spent

> cat transactions.txt | shades "sum(amount) by customer"

customer  amount
jack      115.00
jane        6.00

As a command line utility for histogramming

Histograms are indespensible for understanding value distributions in a data set--especially distributions with a long tail or heavy skew like response times in computer systems or cost of goods on Amazon. Typically it is difficult to pick appropriate bin widths if you don't already have a solid understanding of the data. Shades implements dynamic rebalancing histograms based on this paper so they always make sense for your data set.

Say another file with the same structure as above includes one-minute system load averages as load1

cat hoststats.txt | histo load1
     0.174 (  7) #######
     0.805 ( 30) ##############################
     1.974 ( 11) ###########
     2.936 ( 10) ##########
     3.911 (  8) ########
     5.164 (  5) #####
     6.744 (  7) #######
     7.852 (  4) ####
     9.310 (  1) #
    20.250 (  1) #

Each of these lines is a histogram bucket with the average value on the left and the number of items in the bucket in parenthesis. So the line 5.164 ( 5) ##### can be read as "there are 5 values with a mean close to 5.164".

You can even feed data cubing output from above into the histo utility. Let's say we look back at the customer transaction data from above. To get a sense of the distribution of transaction amounts, you would simply do the following.

cat transactions.txt | shades -p "sum(amount) by transactionid" | histo amount

Use in code

Shades also offers a public OLAP cubing API. See the shades and histo utilities for examples of building data cubes and histograms, respectively.

Roadmap

  • Add 'where' clauses for filtering
  • Numerosity bounding of output from shades by only including the top ranking rows in a set of dimensions.

FAQs

Package last updated on 14 Oct 2013

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.