stm
Takes in a document, spits out a tokenised and stemmed array of terms. Using Porter's Algorithm.
Usage
stm.stem(text, noStopWords)
. Returns an array of terms, stemmed and without punctuation.
text
is the string (text document) in which the calculations are to be performed on.noStopWords
defaults to true
. Set to false
if you want to include stop words–e.g words such as "I" and "the".
Note: This is basically a wrapper around the stem-porter
library by kastor
.
var stm = require('stm');
var str = "you're simply a simplistic house, made for housing";
var stemmed = stm.stem(str);
>> ["simpli", "simplist", "hous", "hous"]
var withStopWords = stm.stem(str, false);
>> [ 'you', 're', 'simpli', 'a', 'simplist', 'hous', 'made', 'for', 'hous'];