Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
stopwords-iso
Advanced tools
The stopwords-iso npm package provides a comprehensive collection of stopwords for multiple languages. Stopwords are common words that are usually filtered out in natural language processing tasks because they carry less meaningful information. This package is useful for text preprocessing in various languages.
Retrieve stopwords for a specific language
This feature allows you to retrieve an array of stopwords for a specific language. In this example, we retrieve the stopwords for English.
const stopwords = require('stopwords-iso');
const englishStopwords = stopwords['en'];
console.log(englishStopwords);
Retrieve stopwords for multiple languages
This feature allows you to retrieve stopwords for multiple languages at once. In this example, we retrieve stopwords for English, French, and German.
const stopwords = require('stopwords-iso');
const languages = ['en', 'fr', 'de'];
const multiLangStopwords = languages.reduce((acc, lang) => {
acc[lang] = stopwords[lang];
return acc;
}, {});
console.log(multiLangStopwords);
Check if a word is a stopword
This feature allows you to check if a given word is a stopword in a specified language. In this example, we check if 'the' is an English stopword and if 'maison' is a French stopword.
const stopwords = require('stopwords-iso');
const isStopword = (word, lang) => stopwords[lang].includes(word);
console.log(isStopword('the', 'en')); // true
console.log(isStopword('maison', 'fr')); // false
The stopword package provides a collection of stopwords for multiple languages, similar to stopwords-iso. However, it also includes additional functionalities such as removing stopwords from a given text. It is more feature-rich in terms of text preprocessing capabilities.
The nltk-stopwords package is a wrapper around the NLTK stopwords corpus, which is widely used in the Python ecosystem. It provides stopwords for multiple languages and is known for its reliability and extensive language support. It is a good alternative if you are looking for a well-established library.
The stopwords-json package provides stopwords in JSON format for multiple languages. It is similar to stopwords-iso in terms of functionality but focuses on providing the data in a simple JSON format, making it easy to integrate into various applications.
The most comprehensive collection of stopwords for multiple languages.
The collection follows the ISO 639-1 language code.
If you only need stopwords for a specific language, there is a separate collection for each.
The collection is in JSON format. You are free to use this collection any way you like. It is only currently published on npm and bower.
$ npm install stopwords-iso
$ bower install stopwords-iso
// Node
const stopwords = require('stopwords-iso'); // object of stopwords for multiple languages
const english = stopwords.en; // english stopwords
If you wish to remove or update some of the stopwords, please file an issue first before sending a PR on the repo of the specific language.
If you would like to add a stopword or a new set of stopwords, please add them as a new text file on the repo of the corresponding language.
All stopwords sources are listed here.
FAQs
The most comprehensive collection of stopwords for multiple languages.
The npm package stopwords-iso receives a total of 107,072 weekly downloads. As such, stopwords-iso popularity was classified as popular.
We found that stopwords-iso demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.