Security News
Bun 1.2 Released with 90% Node.js Compatibility and Built-in S3 Object Support
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.
dclassify is a Naive Bayesian classifier for NodeJS that goes one step further than your usual binary classifier by introducing a unique probablility-of-absence optimisation. In testing this optimisation has led to a ~10% improvement in correctness over conventional binary classifiers. It is mainly intended for classifying items based on a finite set of characteristics, rather than for language processing.
General-purpose Document and DataSet classes are provided for training and test data sets.
If the applyInverse optimisation is used, dclassify will calculate probabilities based on the present tokens as usual, but will also calculate a probability-of-absence for missing tokens. This is unconventional but produces better results particularly when working with smaller vocabularies. Its especially well-suited for classifying items based on a limited set of characteristics.
npm install dclassify
// module dependencies
var dclassify = require('dclassify');
// Utilities provided by dclassify
var Classifier = dclassify.Classifier;
var DataSet = dclassify.DataSet;
var Document = dclassify.Document;
// create some 'bad' test items (name, array of characteristics)
var item1 = new Document('item1', ['a','b','c']);
var item2 = new Document('item2', ['a','b','c']);
var item3 = new Document('item3', ['a','d','e']);
// create some 'good' items (name, characteristics)
var itemA = new Document('itemA', ['c', 'd']);
var itemB = new Document('itemB', ['e']);
var itemC = new Document('itemC', ['b','d','e']);
// create a DataSet and add test items to appropriate categories
// this is 'curated' data for training
var data = new DataSet();
data.add('bad', [item1, item2, item3]);
data.add('good', [itemA, itemB, itemC]);
// an optimisation for working with small vocabularies
var options = {
applyInverse: true
};
// create a classifier
var classifier = new Classifier(options);
// train the classifier
classifier.train(data);
console.log('Classifier trained.');
console.log(JSON.stringify(classifier.probabilities, null, 4));
// test the classifier on a new test item
var testDoc = new Document('testDoc', ['b','d', 'e']);
var result1 = classifier.classify(testDoc);
console.log(result1);
The probabilities get calculated like this.
{
"bad": {
"a": 1,
"b": 0.6666666666666666,
"c": 0.6666666666666666,
"d": 0.3333333333333333,
"e": 0.3333333333333333
},
"good": {
"a": 0,
"b": 0.3333333333333333,
"c": 0.3333333333333333,
"d": 0.6666666666666666,
"e": 0.6666666666666666
}
}
Standard results look like this:
{
"category": "good",
"probability": 0.6666666666666666,
"timesMoreLikely": 2,
"secondCategory": "bad",
"probabilities": [
{ "category": "good", "probability": 0.14814814814814814},
{ "category": "bad", "probability": 0.07407407407407407}
]
}
If you use the 'applyInverse' option, the results are much more emphatic, because training indicates bad items never lack the "a" token.
{
"category": "good",
"probability": 1,
"timesMoreLikely": "Infinity",
"secondCategory": "bad",
"probabilities": [
{ "category": "good", "probability": 0.09876543209876543 },
{ "category": "bad", "probability": 0 }
]
}
FAQs
Optimized Naive Bayesian classifier for NodeJS
We found that dclassify demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.
Security News
Biden's executive order pushes for AI-driven cybersecurity, software supply chain transparency, and stronger protections for federal and open source systems.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.