KMeans Engine
This k-means javascript implementation is optimised for large and sparse data set by using an array of objects to represent a sparse matrix.
Most of the other implementations available in npm take a N x M matrix (a 2d array) as input. However, if the data matrix is sparse, it would consumed a lot of memory when creating the N x M matrix. For example, td-idf vectors of text documents actually form a very large and sparse matrix. It will take much time to allocate the 2d array and will even quit if there is not enough memory.
Installation
npm install kmeans-engine
What's New
1.5.0
Upgrade dependencies to fix security alerts
1.4.0
Support options to provide initial centroids. See details in pull request
1.3.0
Update to newer version of vector-object
1.2.0
Support maxIterations parameter in options
1.1.0
Updated to a newer version of vector-object
Usage
const kmeans = require('kmeans-engine');
const engineers = [
{ html: 5, angular: 5, react: 3, css: 3 },
{ html: 4, react: 5, css: 4 },
{ html: 4, react: 5, vue: 4, css: 5 },
{ html: 3, angular: 3, react: 4, vue: 2, css: 3 },
{ nodejs: 5, python: 3, mongo: 5, mysql: 4, redis: 3 },
{ java: 5, php: 4, ruby: 5, mongo: 3, mysql: 5 },
{ python: 5, php: 4, ruby: 3, mongo: 5, mysql: 4, oracle: 4 },
{ java: 5, csharp: 3, oracle: 5, mysql: 5, mongo: 4 },
{ objc: 3, swift: 5, xcode: 5, crashlytics: 3, firebase: 5, reactnative: 4 },
{ java: 4, swift: 5, androidstudio: 4 },
{ objc: 5, java: 4, swift: 3, androidstudio: 4, xcode: 4, firebase: 4 },
{ objc: 3, java: 5, swift: 3, xcode: 4, apteligent: 4 },
{ docker: 5, kubernetes: 4, aws: 4, ansible: 3, linux: 4 },
{ docker: 4, marathon: 4, aws: 4, jenkins: 5 },
{ docker: 3, marathon: 4, heroku: 4, bamboo: 4, jenkins: 4, nagios: 3 },
{ marathon: 4, heroku: 4, bamboo: 4, jenkins: 4, linux: 3, puppet: 4, nagios: 5 }
];
kmeans.clusterize(engineers, { k: 4, maxIterations: 5, debug: true }, (err, res) => {
console.log('----- Results -----');
console.log(`Iterations: ${res.iterations}`);
console.log('Clusters: ');
console.log(res.clusters);
});
Test
npm install
npm run test
To-Dos
- enhance initial centroid picking
- speed optimisation
Authors
License
MIT