wink-nlp-utils
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
Prepare raw text for Natural Language Processing (NLP) using wink-nlp-utils
. It offers a set of APIs to work on strings such as names, sentences, paragraphs and tokens represented as an array of strings/words. They perform the required pre-processing for many ML tasks such as semantic search, and classification.
Installation
Use npm to install:
npm install wink-nlp-utils --save
Getting Started
The wink-nlp-utils
provides over 36 utility functions for Natural Language Processing tasks. Some representative examples are extracting person's name from a string, compose training corpus for a chat bot, sentence boundary detection, tokenization and stop words removal:
var nlp = require( 'wink-nlp-utils' );
var name = nlp.string.extractPersonsName( 'Dr. Sarah Connor M. Tech., PhD. - AI' );
console.log( name );
var str = '[I] [am having|have] [a] [problem|question]';
console.log( nlp.string.composeCorpus( str ) );
var para = 'AI Inc. is focussing on AI. I work for AI Inc. My mail is r2d2@yahoo.com';
console.log( nlp.string.sentences( para ) );
var s = 'For details on wink, check out http://winkjs.org/ URL!';
console.log( nlp.string.tokenize( s, true ) );
var t = nlp.tokens.removeWords( [ 'mary', 'had', 'a', 'little', 'lamb' ] );
console.log( t );
Try experimenting with these examples on Runkit in the browser.
Documentation
Check out the wink NLP utilities API documentation to learn more.
Need Help?
If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.
About wink
Wink is a family of open source packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS. The code is thoroughly documented for easy human comprehension and has a test coverage of ~100% for reliability to build production grade solutions.
Copyright & License
wink-nlp-utils is copyright 2017-18 GRAYPE Systems Private Limited.
It is licensed under the terms of the MIT License.