New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

mrcluster

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

mrcluster

A multi-core pseudo-MapReduce implementation on NodeJS

0.0.17
Source
npm

Version published: 10 years ago

Weekly downloads: 4; decreased by-50%

Maintainers: 1

Weekly downloads

Created: 10 years ago

Source

Node-MapReduce

A multi-core pseudo-MapReduce implementation on NodeJS

Installation

npm install mrcluster

Usage

Create a new instance

var mrcluster = require("mrcluster").init();

Chaining

The module is written to be chainable. All settings are set via function call chains.

mrcluster
	.file("mockdata_from_mockaroo.csv")
	.lineDelimiter('\n')
	.numBlocks(9);

Starting the MapReduce operation

var mrcluster
	.file("mockdata_from_mockaroo.csv")
	.start();

Settings - file

Specify the csv file or files to read in.

mrcluster.file("mockdata_from_mockaroo.csv");

If an array of files are defined, each Mapper will parse 1 file as a single block.

mrcluster.file(["file1.csv","file2.csv","file3.csv"]);

Settings - lineDelimiter

Specify the delimiter to indicate a new line. Default is \n.

mrcluster.lineDelimiter('\n');

Settings - numBlocks

Specify the number of blocks to split the file into. Default is 2.

mrcluster.numBlocks(9);

Settings - sample

Specify the number of Blocks to sample. The min number of samples must be >= number of Mappers. Default is -1 (Do not sample - run everything).

mrcluster.sample(1);

Settings - numMappers

Specify the number of mappers to create. Default is 2.

mrcluster.numMappers(2);

Settings - map

First input specifies the mapping function to be applied on each line of data. Second input (optional) is a flag to specify whether to write the content of each Mapper to disk. The function should take in a String representing a line of data, and returns an Array[2] representing the resultant key-value pair.

mrcluster    
	.map(function (line) {
        return [line.split(',')[0], 1];
    },
	true)

Settings - mapOnly

Specify whether to run only Mappers. Default is False.

mrcluster.mapOnly(true)

Settings - hash

Specify the number of hashes the hash function will generate. Default is 3. The number of Reducers are currently fixed to be the same as the max number of hashes - each Reducer is assigned to one hash bin.

mrcluster.hash(3);

Settings - reduce

First input specifies the reduce function to be applied. The second input (optional) specifies whether to write the result of each Reduce jobs to disk. This function is applied once in the Mapper and once in the Reducer. It is applied at the end of the Mapper execution, just before returning the mapped results to the master node.
The function should take 2 variables representing the the values for the two key-value pairs. And returns a value representing the resultant value for the two key-value pairs. E.g. The following codes demonstrate the summing of the values for 2 key-value pairs - ['A',1] + ['A',1] = ['A',2]

mrcluster    
	.reduce(function (a,b) {
        return a + b;
    })

Settings - post_reduce

Specify the function to be applied at the end of the Reducer execution. The function should take in an Associative Array holding all the key-values produced by the Reducer. And can return any value to the master node for further collation (e.g. sum).

mrcluster    
    .post_reduce(function (obj) {
        var res = Object.keys(obj).map(function (key) {
            return obj[key];
        });
		console.log(obj)
        return res.reduce(function (a, b) {
            return a+b;
        });
    })

Settings - aggregate

Specify the function to be applied at the end of all tasks. The function should take in an Array (representing the hash bins) holding all the returned Values produced by the post_reduce function (e.g. You can do a summation of all the returned sums of all the Reducers).

mrcluster    
    .aggregate(function (hash_array) {
        console.log("Total: " + hash_array.reduce(function (a, b) {
            return a + b;
        }))
    })

Example 1

A simple count of number of unique domains in the email list.

var mrcluster = require("mrcluster");

mrcluster.init()
    .file("mockdata_from_mockaroo.csv")
    .lineDelimiter('\n')
	.numBlocks(9)
	.numMappers(3)
    .map(function (line) {
        return [line.split(',')[1].split('@')[1] || 'NA', 1];
    })
    .hash(3)
    .reduce(function (a, b) {
        return 1;
    })
    .post_reduce(function (obj) {
        var res = Object.keys(obj).map(function (key) {
            return obj[key];
        });
		console.log(obj)
        return res.reduce(function (a, b) {
            return a+b;
        });
    })
    .aggregate(function (hash_array) {
        console.log("Total: " + hash_array.reduce(function (a, b) {
            return a + b;
        }))
    })
    .start();

Keywords

FAQs

What is mrcluster?

Is mrcluster popular?

Is mrcluster well maintained?

Package last updated on 20 Nov 2014

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

mrcluster

Node-MapReduce

Installation

Usage

Create a new instance

Chaining

Starting the MapReduce operation

Settings - file

Settings - lineDelimiter

Settings - numBlocks

Settings - sample

Settings - numMappers

Settings - map

Settings - mapOnly

Settings - hash

Settings - reduce

Settings - post_reduce

Settings - aggregate

Example 1

Keywords

Related posts

React Team Updates CRA Migration Guidance After Community Pushback

Ransomware in 2024: Record-Low Payment Rate Signals Changing Economics of Cybercrime