![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
documentdb-lumenize
Advanced tools
Aggregations (Group-by, Pivot-table, and N-dimensional Cube) and Time Series Transformations as Stored Procedures in DocumentDB
Copyright (c) 2015, Lawrence S. Maccherone, Jr.
Aggregations (Group-by, Pivot-table, and N-dimensional Cube) and Time Series Transformations as Stored Procedures in DocumentDB
The bad news: DocumentDB does not include aggregation capability.
The good news: DocumentDB includes stored procedures and documentdb-lumenize uses this to add aggregation capability that far exceeds that which you are used to with SQL.
UPDATE: I have been able to run some latency and throughput performance testing and I'm excited to be able to say that documentdb-lumenize has huge latency and bandwidth benefits with only slight penalty in request units (theoretically reduction in peak throughput). Here is a summary of my experiments:
To install use:
npm install -save documentdb-lumenize
To load the sproc to your collection:
cube = require('../sprocs/cube')
client.upsertStoredProcedure(collectionLink, {id: 'cube', body: cube})
bower install -save documentdb-lumenize
Note, the bower alternative only installs the sprocs in .string
form. You can upload it to your collection like this:
cube = require('fs').readFileSync('./sprocs/cube.string', 'utf8')
client.upsertStoredProcedure(collectionLink, {id: 'cube', body: cube})
You can either fetch the cube.string file from this repository (or via bower) and put it in your project or you can load the sproc into your collection directly from GitHub as follows:
Uri uri = new System.Uri("https://raw.githubusercontent.com/lmaccherone/documentdb-lumenize/master/sprocs/cube.string");
WebClient wc = new WebClient();
Stream stream = wc.OpenRead(uri);
StreamReader sr = new StreamReader(stream);
string sprocString = await sr.ReadToEndAsync();
stream.Close();
sproc = await client.UpsertStoredProcedureAsync(documentCollection.SelfLink,
new StoredProcedure
{
Id = "cube",
Body = sprocString
});
Keep in mind, the above code will always get the latest version from GitHub. If you want to test before updates are used, then you'll want to download it into your project and upsertStoredProcedure with the contents of that cube.string file.
In CoffeeScript
{OLAPCube} = require('lumenize') # Only if you want nicely formatted output
cube = require('../sprocs/cube')
client.upsertStoredProcedure(collectionLink, {id: 'cube', body: cube}, (err, result) ->
if err?
throw new Error("Error loading cube")
console.log("Sproc cube loaded to collection")
filterQuery = 'SELECT * FROM Facts f WHERE f.Priority = 1'
cubeConfig = {groupBy: 'state', field: 'points', f: 'sum'} # See below for more advanced configuration
memo = {cubeConfig, filterQuery}
sprocLink = collectionLink + '/sprocs/cube'
client.executeStoredProcedure(sprocLink, memo, (err, response) ->
cube = OLAPCube.newFromSavedState(response.memo.savedCube)
console.log(cube.toString()) # Outputs nice pretty table
)
)
In JavaScript
var OLAPCube, cube;
OLAPCube = require('lumenize').OLAPCube;
cube = require('../sprocs/cube');
client.upsertStoredProcedure(collectionLink, {id: 'cube', body: cube}, function(err, result) {
var cubeConfig, filterQuery, memo, sprocLink;
if (err != null) {
throw new Error("Error loading cube");
}
console.log("Sproc cube loaded to collection");
filterQuery = 'SELECT * FROM Facts f WHERE f.Priority = 1';
cubeConfig = {groupBy: 'state', field: 'points', f: 'sum'};
memo = {cubeConfig: cubeConfig, filterQuery: filterQuery};
sprocLink = collectionLink + '/sprocs/cube';
return client.executeStoredProcedure(sprocLink, memo, function(err, response) {
cube = OLAPCube.newFromSavedState(response.memo.savedCube);
console.log(cube.toString());
});
});
Note, that if the stored procedure hits the timeout before it's done going through all of the data specified by your
filterQuery (or every document in the collection if no filterQuery is provided), it will return with partially aggregated
results and a continuation token. Merely pass back in (after delay specified x-ms-retry-after-ms
) the memo object that
came back in the response body as the parameter to the next call of the stored procedure. It will take care of continuing
right where it left off. When using documentdb-utils.WrappedClient.executeStoredProcedure, this delay and retry logic
is automatically taken care of for you among other niceties.
There is a fully functional C# .NET example in DocumentDB-Lumenize.cs found in the root of this repository. But the interesting bits are as follows:
// Create config for executing sproc. See below for all the myriad options that can be specified in the config
string configString = @"{
cubeConfig: {
groupBy: 'state',
field: 'points',
f: 'sum'
},
filterQuery: 'SELECT * FROM c'
}";
Object config = JsonConvert.DeserializeObject<Object>(configString);
dynamic result = await client.ExecuteStoredProcedureAsync<dynamic>("dbs/db1/colls/coll1/sprocs/cube", config);
Console.WriteLine(result.Response);
Note, that if the stored procedure hits the timeout before it's done going through all of the data specified by your
filterQuery (or every document in the collection if no filterQuery is provided), it will return with partially aggregated
results and a continuation token. Merely pass back in (after delay specified x-ms-retry-after-ms
) the memo object that
came back in the response body as the parameter to the next call of the stored procedure. It will take care of continuing
right where it left off.
Let's assume this is the only data in your collection.
[
{id: 1, value: 10}
{id: 1, value: 100}
{id: 2, value: 20}
{id: 3, value: 30}
]
Now, let's call the cube with the following:
memo = {cubeConfig: {groupBy: 'id', field: "value", f: "sum"}}
After you call the cube stored procedure, you should expect this to be in the savedCube.cellsAsCSVStyleArray
parameter of the response. Note, the _count metric is always
calculated even when not specified.
[
[ 'id', '_count', 'value_sum' ],
[ 1, 2, 110 ],
[ 2, 1, 20 ],
[ 3, 1, 30 ]
]
All of the Lumenize aggregation functions are supported including but not limited to:
sum
count
average
max
min
median
standardDeviation
variance
product
(careful, this can overflow fast)sumSquares
sumCubes
values
uniqueValues
p<your-percentile>
(e.g. p75
, which is the upper quartile)The ability to provide your own aggregation function is not supported in this sproc version of Lumenize. Fetch all the data client-side and use the full Lumenize if you need that.
Please note that when using any metric that cannot be incrementally calculated (median and percentiles in the list above), for large aggregations, it's more efficient to use the cube with a values
metric and calculate your median or percentile after the fact. You can call Lumenize.functions.median(values)
for each cell in the cube that returned. The deriveFieldsOnOutput functionality of the full Lumenize.OLAPCube is not supported yet in this stored procedure form.
Now, let's assume the same set of facts in the collection, but we add a filterQuery
cubeConfig = {groupBy: 'id', field: "value", f: "sum"}
filterQuery = 'SELECT * FROM Facts f WHERE f.id = 1'
And we call the cube stored procedure with
{cubeConfig: cubeConfig, filterQuery: filterQuery}
You should expect to see this in the savedCube.cellsAsCSVStyleArray
parameter that is returned.
[
[ 'id', '_count', 'value_sum' ],
[ 1, 2, 110 ]
]
It's possible to pass in an Object containing both a parameterized string for the filterQuery and the parameters as specified here. Note, when you compose your filterQuery
you must make sure that all the expected fields are returned.
Let's walk through a simple 2D example from facts to output. Let's say you have this set of facts in your collection or returned with your filterQuery
:
[
{ProjectHierarchy: [1, 2, 3], Priority: 1, Points: 10},
{ProjectHierarchy: [1, 2, 4], Priority: 2, Points: 5 },
{ProjectHierarchy: [5] , Priority: 1, Points: 17},
{ProjectHierarchy: [1, 2] , Priority: 1, Points: 3 },
]
The ProjectHierarchy field models its hierarchy (tree) as an array containing a materialized path. The first fact is "in" Project 3 whose parent is Project 2, whose parent is Project 1. The second fact is "in" Project 4 whose parent is Project 2 which still has Project 1 as its parent. Project 5 is another root Project like Project 1; and the fourth fact is "in" Project 2.
So the first fact will roll-up the tree and be aggregated against [1], and [1, 2] as well as [1, 2, 3]. Root Project 1 will get the data from all but the third fact which will get aggregated against root Project 5.
We specify the ProjectHierarchy field as a dimension of type 'hierarchy' and the Priorty field as a normal dimension.
dimensions = [
{field: "ProjectHierarchy", type: 'hierarchy'},
{field: "Priority"}
]
This will create a 2D "cube" where each unique value for ProjectHierarchy and Priority defines a different cell. Note, this happens to be a 2D "cube" (more commonly referred to as a pivot table), but you can also have a 1D cube (a simple group-by), a 3D cube, or even an n-dimensional hypercube where n is greater than 3.
You can specify any number of metrics to be calculated for each cell in the cube.
metrics = [
{field: "Points", f: "sum", as: "Scope"}
]
You can use any of the aggregation functions found in Lumenize.functions. Whether you specify it or not, the count metric is automatically tracked for each cell. The as
specification is optional. If missing, it will build the name of the resulting metric from the field name and the function name. So without the as: "Scope"
the second metric in the example above would have been named Points_sum
.
Next, we build the config parameter from our dimension and metrics specifications.
cubeConfig = {dimensions: dimensions, metrics: metrics}
Hierarchy dimensions automatically roll up but you can also tell it to keep all totals by setting config.keepTotals to true. The totals are then kept in the cells where one or more of the dimension values are set to null
. Note, you can also set keepTotals for individual dimension and should probably use that if you have more than a few dimensions
but we're going to set it globally here:
cubeConfig.keepTotals = true
Now, let's call our cube stored procedure with the following:
{cubeConfig: cubeConfig}
We can inspect savedCube.cellsAsCSVStyleArray
like we did in the simple groupBy examples above, but let's use the full power of Lumenize's OLAP cube to get the output of the results
this time. You can rehydrate the cube in the browser or node.js by passing in the value returned as savedCube
into Lumenize's OLAPCube.newFromSavedState
.
OLAPCube = require('lumenize').OLAPCube
cube = OLAPCube.newFromSavedState(savedCube)
You can check out the full documentation for the Lumenize OLAPCube, but here are some examples.
getCell()
allows you to extract a single cell. The "total" cell for all facts where Priority = 1 can be found as follows:
console.log(cube.getCell({Priority: 1}))
# { ProjectHierarchy: null, Priority: 1, _count: 3, Scope: 30 }
Notice how the ProjectHierarchy field value is null
. This is because it is a total cell for Priority=1 regardless of ProjectHierarchy value. Think of null
values in this context as wildcards that indicate the total fields.
Similarly, we can get the total for all descendants of ProjectHierarchy = [1] regarless of Priority as follows:
console.log(cube.getCell({ProjectHierarchy: [1]}))
# { ProjectHierarchy: [ 1 ], Priority: null, _count: 3, Scope: 18 }
If you wanted the cell where ProjectHierarchy is [5] and priority is 1, that would just be:
console.log(cube.getCell({ProjectHierarchy: [5], Priority: 1}))
# { ProjectHierarchy: [ 5 ], Priority: 1, _count: 1, Scope: 17 }
getCell()
uses the cellIndex so it's very efficient. Using getCell()
and getDimensionValues()
, you can extract exactly what you want from the OLAPCube or you can use the slice()
method to pull out the data in a format that is ideally suited to graphing.
You can call the toString()
method which extracts a 1D or 2D slice for tabular display.
console.log(cube.toString('ProjectHierarchy', 'Priority', 'Scope'))
# | || Total | 1 2|
# |==============================|
# |Total || 35 | 30 5|
# |------------------------------|
# |[1] || 18 | 13 5|
# |[1,2] || 18 | 13 5|
# |[1,2,3] || 10 | 10 |
# |[1,2,4] || 5 | 5|
# |[5] || 17 | 17 |
The following dimension types are supported:
There is no need to tell the OLAPCube what type to use with the exception of #2. In that case, add type: 'hierarchy'
to the dimensions row like this:
dimensions = [
{field: 'hierarchicalDimensionField', type: 'hierarchy'} #, ...
]
This OLAP Cube implementation assumes your hierarchies (trees) are modeled as a materialized path array. This approach is commonly used with NoSQL databases like CouchDB and MongoDB (combining materialized path and array of ancestors) and even SQL databases supporting array types like Postgres.
This approach differs from the traditional OLAP/MDX fixed/named level hierarchy approach. In that approach, you assume that the number of levels in the hierarchy are fixed. Also, each level in the hierarchy is either represented by a different column (clothing example --> level 0: SEX column - mens vs womens; level 1: TYPE column - pants vs shorts vs shirts; etc.) or predetermined ranges of values in a single field (date example --> level 0: year; level 1: quarter; level 2: month; etc.)
However, the approach used by this OLAPCube implementaion is the more general case, because it can easily simulate fixed/named level hierachies whereas the reverse is not true. In the clothing example above, you would simply key your dimension off of a derived field that was a combination of the SEX and TYPE columns (e.g. ['mens', 'pants'])
If you don't specify type: 'hierarchy' and the OLAPCube sees a field whose value is an Array in a dimension field, the data in that fact would get aggregated against each element in the Array. So a non-hierarchical Array field like ['x', 'y', 'z'] would get aggregated against 'x', 'y', and 'z' rather than ['x'], ['x', 'y'], and ['x','y','z]. This functionality is useful for accomplishing analytics on tags, but it can be used in other powerful ways. For instance let's say you have a list of events:
events = [
{name: 'Renaissance Festival', activeMonths: ['September', 'October']},
{name: 'Concert Series', activeMonths: ['July', 'August', 'September']},
{name: 'Fall Festival', activeMonths: ['September']}
]
You could figure out the number of events active in each month by specifying "activeMonths" as a dimension. Lumenize.TimeInStateCalculator (and other calculators in Lumenize) use this technique.
require('documentdb-lumenize).cube
At this point, I have a pretty long list of things to add to documentdb-lumenize. Namely the time-series calculators from Lumenize.
Free for evaluation or non-commercial purposes. Contact me for other licensing.
FAQs
Aggregations (Group-by, Pivot-table, and N-dimensional Cube) and Time Series Transformations as Stored Procedures in DocumentDB
The npm package documentdb-lumenize receives a total of 17 weekly downloads. As such, documentdb-lumenize popularity was classified as not popular.
We found that documentdb-lumenize demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.