Text Preprocessor
Normalizing texts before any natural language processing
Instalation
Using Yarn:
yarn add text-preprocessor
Or using NPM:
npm i --save text-preprocessor
Usage
const preprocessor = require('text-preprocessor');
const text = preprocessor(' that`s great! \n \t & but don’t take too long okay? \n bjŏȒk—Ɏó ');
text.clean()
.toLowerCase()
.unescape()
.killUnicode()
.normalizeSingleCurlyQuotes()
.expandContractions();
console.log(text.toString());
Constructs a TextPreprocessor instance
Methods
new TextPreprocessor(text)
Normalizing texts before any natural language processing
textPreprocessor.clean()
and strips extra whitespace from all documents, leaving only at most one whitespace between any two other characters.
Kind: instance method of TextPreprocessor
textPreprocessor.unescape()
Converts the HTML entities &, <, >, ", and ' in string to their corresponding characters.
Kind: instance method of TextPreprocessor
textPreprocessor.toLowerCase()
Converts all the alphabetic characters in a string to lowercase.
Kind: instance method of TextPreprocessor
textPreprocessor.toString()
returns the result of chains so far
Kind: instance method of TextPreprocessor
textPreprocessor.expandContractions()
Replaces all occuring English contractions by their expanded equivalents, e.g. "don't" is changed to "do not".
Kind: instance method of TextPreprocessor
textPreprocessor.killUnicode()
Replaces hugely-ignorant, and widely subjective transliteration of latin, cryllic, greek unicode characters with english ascii.
Kind: instance method of TextPreprocessor
textPreprocessor.replace(regexp, value)
Replaces any occurrence of the given expression with the givven string
Kind: instance method of TextPreprocessor
textPreprocessor.remove(regexp)
Removes any occurrence of the given expression
Kind: instance method of TextPreprocessor
textPreprocessor.removeTagsAndMentions()
Removes #tags, @mentions from start of the text
Kind: instance method of TextPreprocessor
textPreprocessor.removeSpecialCharachters()
Removes all special charachters
Kind: instance method of TextPreprocessor
textPreprocessor.removeURLs()
Removes Urls and emails
Kind: instance method of TextPreprocessor
textPreprocessor.removeParenthesesContents()
Remove brackets and parentheses contents.
Kind: instance method of TextPreprocessor
Example
`Hello, this is Mike (example)` to `Hello, this is Mike `
textPreprocessor.removePunctuation()
Removes punctuation from end of the text
Kind: instance method of TextPreprocessor
textPreprocessor.normalizeSingleCurlyQuotes()
Coerce single curly quotes. don’t
to don't
Kind: instance method of TextPreprocessor
textPreprocessor.normalizeDoubleCurlyQuotes()
Coerce double curly quotes. it is «Khorzu”
to it is "Khorzu"
Kind: instance method of TextPreprocessor
textPreprocessor.defaults()
clean
,toLowerCase
,unescape
,killUnicode
and normalizeSingleCurlyQuotes
Kind: instance method of TextPreprocessor
textPreprocessor.chain()
Executes chain of givven method names
Kind: instance method of TextPreprocessor
Normalizing texts before any natural language processing
Kind: global function