
Security News
Deno 2.2 Improves Dependency Management and Expands Node.js Compatibility
Deno 2.2 enhances Node.js compatibility, improves dependency management, adds OpenTelemetry support, and expands linting and task automation for developers.
A multi-core pseudo-MapReduce implementation on NodeJS
npm install mrcluster
var mrcluster = require("mrcluster").init();
The module is written to be chainable. All settings are set via function call chains.
mrcluster
.file("mockdata_from_mockaroo.csv")
.lineDelimiter('\n')
.numBlocks(9);
var mrcluster
.file("mockdata_from_mockaroo.csv")
.start();
Specify the csv file or files to read in.
mrcluster.file("mockdata_from_mockaroo.csv");
If an array of files are defined, each Mapper
will parse 1 file as a single block.
mrcluster.file(["file1.csv","file2.csv","file3.csv"]);
Specify the delimiter to indicate a new line. Default is \n
.
mrcluster.lineDelimiter('\n');
Specify the number of blocks to split the file into. Default is 2
.
mrcluster.numBlocks(9);
Specify the number of Blocks to sample. The min number of samples must be >= number of Mappers
. Default is -1
(Do not sample - run everything).
mrcluster.sample(1);
Specify the number of mappers to create. Default is 2
.
mrcluster.numMappers(2);
First input specifies the mapping function to be applied on each line of data. Second input (optional) is a flag to specify whether to write the content of each Mapper to disk.
The function should take in a String
representing a line of data, and returns an Array[2]
representing the resultant key-value pair.
mrcluster
.map(function (line) {
return [line.split(',')[0], 1];
},
true)
Specify whether to run only Mappers. Default is False
.
mrcluster.mapOnly(true)
Specify the number of hashes the hash function will generate. Default is 3
.
The number of Reducers
are currently fixed to be the same as the max number of hashes - each Reducer
is assigned to one hash bin.
mrcluster.hash(3);
First input specifies the reduce function to be applied. The second input (optional) specifies whether to write the result of each Reduce jobs to disk.
This function is applied once in the Mapper
and once in the Reducer
. It is applied at the end of the Mapper
execution, just before returning the mapped results to the master node.
The function should take 2 variables representing the the values for the two key-value pairs. And returns a value representing the resultant value for the two key-value pairs.
E.g. The following codes demonstrate the summing of the values for 2 key-value pairs - ['A',1] + ['A',1] = ['A',2]
mrcluster
.reduce(function (a,b) {
return a + b;
})
Specify the function to be applied at the end of the Reducer
execution.
The function should take in an Associative Array
holding all the key-values produced by the Reducer
. And can return any value to the master node for further collation (e.g. sum).
mrcluster
.post_reduce(function (obj) {
var res = Object.keys(obj).map(function (key) {
return obj[key];
});
console.log(obj)
return res.reduce(function (a, b) {
return a+b;
});
})
Specify the function to be applied at the end of all tasks.
The function should take in an Array
(representing the hash bins) holding all the returned Values produced by the post_reduce
function (e.g. You can do a summation of all the returned sums of all the Reducers
).
mrcluster
.aggregate(function (hash_array) {
console.log("Total: " + hash_array.reduce(function (a, b) {
return a + b;
}))
})
A simple count of number of unique domains in the email list.
var mrcluster = require("mrcluster");
mrcluster.init()
.file("mockdata_from_mockaroo.csv")
.lineDelimiter('\n')
.numBlocks(9)
.numMappers(3)
.map(function (line) {
return [line.split(',')[1].split('@')[1] || 'NA', 1];
})
.hash(3)
.reduce(function (a, b) {
return 1;
})
.post_reduce(function (obj) {
var res = Object.keys(obj).map(function (key) {
return obj[key];
});
console.log(obj)
return res.reduce(function (a, b) {
return a+b;
});
})
.aggregate(function (hash_array) {
console.log("Total: " + hash_array.reduce(function (a, b) {
return a + b;
}))
})
.start();
FAQs
A single node multi-core pseudo-MapReduce implementation on NodeJS. Input files are automatically broken into blocks and distributed to the Mappers and Reducers. Examples of implementations can be found in the README.
The npm package mrcluster receives a total of 3 weekly downloads. As such, mrcluster popularity was classified as not popular.
We found that mrcluster demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Deno 2.2 enhances Node.js compatibility, improves dependency management, adds OpenTelemetry support, and expands linting and task automation for developers.
Security News
React's CRA deprecation announcement sparked community criticism over framework recommendations, leading to quick updates acknowledging build tools like Vite as valid alternatives.
Security News
Ransomware payment rates hit an all-time low in 2024 as law enforcement crackdowns, stronger defenses, and shifting policies make attacks riskier and less profitable.