Research
Security News
Kill Switch Hidden in npm Packages Typosquatting Chalk and Chokidar
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
WinkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP applications easier and faster, winkNLP is optimized for the right balance of performance and accuracy.
Its word embedding support unlocks deeper text analysis. Represent words and text as numerical vectors with ease, bringing higher accuracy in tasks like semantic similarity, text classification, and beyond – even within a browser.
It is built ground up with no external dependency and has a lean code base of ~10Kb minified & gzipped. A test coverage of ~100% and compliance with the Open Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence.
WinkNLP with full Typescript support, runs on Node.js, web browsers and Deno.
Wikipedia article timeline | Context aware word cloud | Key sentences detection |
---|---|---|
Head to live examples to explore further.
WinkNLP can easily process large amount of raw text at speeds over 650,000 tokens/second on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.
Environment | Benchmarking Command |
---|---|
Node.js | node benchmark/run |
Browser | How to measure winkNLP's speed on browsers? |
WinkNLP has a comprehensive natural language processing (NLP) pipeline covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer). It offers a rich feature set:
🐎 Fast, lossless & multilingual tokenizer | For example, the multilingual text string "¡Hola! नमस्कार! Hi! Bonjour chéri" is tokenized as ["¡", "Hola", "!", "नमस्कार", "!", "Hi", "!", "Bonjour", "chéri"] . The tokenizer processes text at a speed close to 4 million tokens/second on a M1 MBP's browser. |
✨ Developer friendly and intuitive API | With winkNLP, process any text using a simple, declarative syntax; most live examples have 30-40 lines of code. |
🖼 Best-in-class text visualization | Programmatically mark tokens, sentences, entities, etc. using HTML mark or any other tag of your choice. |
♻️ Extensive text processing features | Remove and/or retain tokens with specific attributes such as part-of-speech, named entity type, token type, stop word, shape and many more; compute Flesch reading ease score; generate n-grams; normalize, lemmatise or stem. Checkout how with the right kind of text preprocessing, even Naive Bayes classifier achieves impressive (≥90%) accuracy in sentiment analysis and chatbot intent classification tasks. |
🔠 Pre-trained language models | Compact sizes starting from ~1MB (minified & gzipped) – reduce model loading time drastically down to ~1 second on a 4G network. |
↗️ Word vectors | 100-dimensional English word embeddings for over 350K English words, which are optimized for winkNLP. Allows easy computation of sentence or document embeddings. |
Use npm install:
npm install wink-nlp --save
In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The table below outlines the version specific installation command:
Node.js Version | Installation |
---|---|
16 or 18 | npm install wink-eng-lite-web-model --save |
14 or 12 | node -e "require('wink-nlp/models/install')" |
The wink-eng-lite-web-model is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section. This is the recommended model.
The second command installs the wink-eng-lite-model, which works with Node.js version 14 or 12.
Enable esModuleInterop
and allowSyntheticDefaultImports
in the tsconfig.json
file:
"compilerOptions": {
"esModuleInterop": true,
"allowSyntheticDefaultImports": true,
...
}
If you’re using winkNLP in the browser use the wink-eng-lite-web-model. Learn about its installation and usage in our guide to using winkNLP in the browser. Explore winkNLP recipes on Observable for live browser based examples.
Follow the example on replit.
Here is the "Hello World!" of winkNLP:
// Load wink-nlp package.
const winkNLP = require( 'wink-nlp' );
// Load english language model.
const model = require( 'wink-eng-lite-web-model' );
// Instantiate winkNLP.
const nlp = winkNLP( model );
// Obtain "its" helper to extract item properties.
const its = nlp.its;
// Obtain "as" reducer helper to reduce a collection.
const as = nlp.as;
// NLP Code.
const text = 'Hello World🌎! How are you?';
const doc = nlp.readDoc( text );
console.log( doc.out() );
// -> Hello World🌎! How are you?
console.log( doc.sentences().out() );
// -> [ 'Hello World🌎!', 'How are you?' ]
console.log( doc.entities().out( its.detail ) );
// -> [ { value: '🌎', type: 'EMOJI' } ]
console.log( doc.tokens().out() );
// -> [ 'Hello', 'World', '🌎', '!', 'How', 'are', 'you', '?' ]
console.log( doc.tokens().out( its.type, as.freqTable ) );
// -> [ [ 'word', 5 ], [ 'punctuation', 2 ], [ 'emoji', 1 ] ]
Experiment with winkNLP on RunKit.
The winkNLP processes raw text at ~650,000 tokens per second with its wink-eng-lite-web-model, when benchmarked using "Ch 13 of Ulysses by James Joyce" on a M1 Macbook Pro machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.
The benchmark was conducted on Node.js versions 16, and 18.
It pos tags a subset of WSJ corpus with an accuracy of ~95% — this includes tokenization of raw text prior to pos tagging. The present state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.
Its general purpose sentiment analysis delivers a f-score of ~84.5%, when validated using Amazon Product Review Sentiment Labelled Sentences Data Set at UCI Machine Learning Repository. The current benchmark accuracy for specifically trained models can range around 95%.
Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entire History of India Volume I with a total peak memory requirement of under 80MB. The book has around 350 pages which translates to over 125,000 tokens.
Please ask at Stack Overflow or discuss at Wink JS GitHub Discussions or chat with us at Wink JS Gitter Lobby.
If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a PR.
Looking for a new feature, request it via the new features & ideas discussion forum or consider becoming a contributor.
WinkJS is a family of open source packages for Natural Language Processing, Machine Learning, and Statistical Analysis in NodeJS. The code is thoroughly documented for easy human comprehension and has a test coverage of ~100% for reliability to build production grade solutions.
Wink NLP is copyright 2017-24 GRAYPE Systems Private Limited.
It is licensed under the terms of the MIT License.
Version 2.3.2 November 30, 2024
FAQs
Developer friendly Natural Language Processing ✨
The npm package wink-nlp receives a total of 33,188 weekly downloads. As such, wink-nlp popularity was classified as popular.
We found that wink-nlp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.