What is skmeans?
The skmeans npm package is a simple and efficient implementation of the k-means clustering algorithm. It is used for partitioning a dataset into k distinct, non-overlapping subsets (clusters). The package is designed to be lightweight and easy to use, making it suitable for quick clustering tasks in JavaScript applications.
What are skmeans's main functionalities?
Basic k-means clustering
This feature allows you to perform basic k-means clustering on a dataset. The code sample demonstrates how to cluster a simple 2D dataset into 2 clusters.
const skmeans = require('skmeans');
const data = [[1, 2], [2, 3], [3, 4], [8, 9], [9, 10], [10, 11]];
const k = 2;
const result = skmeans(data, k);
console.log(result);
Custom initialization
This feature allows you to specify custom initial centroids for the k-means algorithm. The code sample demonstrates how to initialize the centroids manually.
const skmeans = require('skmeans');
const data = [[1, 2], [2, 3], [3, 4], [8, 9], [9, 10], [10, 11]];
const k = 2;
const initialCentroids = [[1, 2], [8, 9]];
const result = skmeans(data, k, 'kmpp', initialCentroids);
console.log(result);
Weighted k-means clustering
This feature allows you to perform weighted k-means clustering, where each data point can have a different weight. The code sample demonstrates how to apply weights to the data points.
const skmeans = require('skmeans');
const data = [[1, 2], [2, 3], [3, 4], [8, 9], [9, 10], [10, 11]];
const weights = [1, 1, 1, 2, 2, 2];
const k = 2;
const result = skmeans(data, k, 'kmpp', null, weights);
console.log(result);
Other packages similar to skmeans
kmeans-js
kmeans-js is another JavaScript implementation of the k-means clustering algorithm. It offers similar functionality to skmeans but includes additional features like support for different distance metrics and more advanced initialization methods. It is slightly more complex but provides more flexibility for advanced users.
ml-kmeans
ml-kmeans is part of the machine learning library 'ml' and provides a robust implementation of the k-means algorithm. It is designed to work seamlessly with other machine learning tools in the 'ml' ecosystem, making it a good choice for more comprehensive machine learning projects. It offers more advanced options and better integration with other machine learning algorithms compared to skmeans.
simple-statistics
simple-statistics is a library that provides a wide range of statistical tools, including k-means clustering. While it is not solely focused on k-means, it offers a comprehensive set of statistical functions that can be useful for data analysis. It is a good choice if you need a broader set of statistical tools in addition to k-means clustering.
skmeans
Super fast simple k-means and k-means++ implementation for unidimiensional and multidimensional data. Works on nodejs and browser.
Installation
npm install skmeans
Usage
NodeJS
const skmeans = require("skmeans");
var data = [1,12,13,4,25,21,22,3,14,5,11,2,23,24,15];
var res = skmeans(data,3);
Browser
<!doctype html>
<html>
<head>
<script src="skmeans.js"></script>
</head>
<body>
<script>
var data = [1,12,13,4,25,21,22,3,14,5,11,2,23,24,15];
var res = skmeans(data,3);
console.log(res);
</script>
</body>
</html>
Results
{
it: 2,
k: 3,
idxs: [ 2, 0, 0, 2, 1, 1, 1, 2, 0, 2, 0, 2, 1, 1, 0 ],
centroids: [ 13, 23, 3 ]
}
API
skmeans(data,k,[centroids],[iterations])
Calculates unidimiensional and multidimensional k-means clustering on data. Parameters are:
- data Unidimiensional or multidimensional array of values to be clustered. for unidimiensional data, takes the form of a simple array [1,2,3.....,n]. For multidimensional data, takes a
NxM array [[1,2],[2,3]....[n,m]]
- k Number of clusters
- centroids Optional. Initial centroid values. If not provided, the algorith will try to choose an apropiate ones. Alternative values can be:
- "kmrand" Cluster initialization will be random, but with extra checking, so there will no be two equal initial centroids.
- "kmpp" The algorythm will use the k-means++ cluster initialization method.
- iterations Optional. Maximum number of iterations. If not provided, it will be set to 10000.
The function will return an object with the following data:
- it The number of iterations performed until the algorithm has converged
- k The cluster size
- centroids The value for each centroid of the cluster
- idxs The index to the centroid corresponding to each value of the data array