Boruta
Feature selection is a process of filtering variables with some method or criteria (Wiki).
It often improves a machine learning model performance and helps with data exploration.
Boruta [1] is a feature selection method that identifies all-relevant variables, instead of just selecting a minimal subset.
Boruta.js is almost line-by-line port of R's package Boruta to JavaScript.
It depends on the random-forest package, but can be used with other models as well.
Example
const boruta = require('boruta')
const make = require('mkdata')
const [X, y] = make.friedman1({ nSamples: 1000 })
const bor = boruta(X, y)
console.log(bor.finalDecision)
Results:
{
'0': 'Confirmed',
'1': 'Confirmed',
'2': 'Rejected',
'3': 'Confirmed',
'4': 'Rejected',
'5': 'Rejected',
'6': 'Rejected',
'7': 'Rejected',
'8': 'Rejected',
'9': 'Rejected'
}
Web demo
You can try Boruta in the StatSim app: https://statsim.com/select/.
It visualizes importance scores with final decisions and also suports multiple base models (linear regression, logistic regression, KNN, random forest)
References
- Feature Selection with the Boruta Package (2010) Miron B. Kursa, Witold R. Rudnicki