Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
lunr-languages
Advanced tools
A a collection of languages stemmers and stopwords for Lunr Javascript library
The lunr-languages npm package extends the functionality of the Lunr.js library to support multiple languages. It provides language-specific stemmers, stop word lists, and other tools to enhance search indexing and querying for non-English languages.
Language-specific Stemmers
This feature allows you to use language-specific stemmers to improve search accuracy. The code sample demonstrates how to set up a French stemmer and create an index with French content.
const lunr = require('lunr');
require('lunr-languages/lunr.stemmer.support')(lunr);
require('lunr-languages/lunr.fr')(lunr);
const idx = lunr(function () {
this.use(lunr.fr);
this.field('title');
this.field('body');
this.add({
'title': 'Bonjour',
'body': 'Le monde est beau'
});
});
console.log(idx.search('beau'));
Stop Word Lists
This feature allows you to use custom stop word lists to exclude common words from the index. The code sample demonstrates how to set up a custom stop word list for French.
const lunr = require('lunr');
require('lunr-languages/lunr.stemmer.support')(lunr);
require('lunr-languages/lunr.fr')(lunr);
require('lunr-languages/lunr.stopword')(lunr);
lunr.fr.stopWordFilter = lunr.generateStopWordFilter(['et', 'le', 'la']);
const idx = lunr(function () {
this.use(lunr.fr);
this.field('title');
this.field('body');
this.add({
'title': 'Bonjour',
'body': 'Le monde est beau'
});
});
console.log(idx.search('monde'));
Multi-language Support
This feature allows you to create a search index that supports multiple languages simultaneously. The code sample demonstrates how to set up an index that supports both French and German.
const lunr = require('lunr');
require('lunr-languages/lunr.stemmer.support')(lunr);
require('lunr-languages/lunr.multi')(lunr);
require('lunr-languages/lunr.fr')(lunr);
require('lunr-languages/lunr.de')(lunr);
const idx = lunr(function () {
this.use(lunr.multi('fr', 'de'));
this.field('title');
this.field('body');
this.add({
'title': 'Bonjour',
'body': 'Le monde est beau'
});
this.add({
'title': 'Hallo',
'body': 'Die Welt ist schön'
});
});
console.log(idx.search('schön'));
Elasticlunr is a lightweight full-text search library that is similar to Lunr.js but offers more flexibility and customization options. It supports multiple languages and provides a more modular approach to building search indexes.
Search-index is a powerful and flexible search library that supports full-text search, faceted search, and more. It is designed to be highly customizable and can handle large datasets efficiently. It also supports multiple languages and offers advanced features like real-time indexing and querying.
Flexsearch is a high-performance full-text search library that offers fast indexing and querying capabilities. It supports multiple languages and provides a range of configuration options to optimize search performance. It is designed to be lightweight and efficient, making it suitable for use in both client-side and server-side applications.
Lunr Languages is a Lunr addon that helps you search in documents written in the following languages:
Lunr Languages is compatible with Lunr version 0.6
, 0.7
, 1.0
and 2.X
.
Lunr-languages works well with script loaders (Webpack, requirejs) and can be used in the browser and on the server.
The following example is for the German language (de).
Add the following JS files to the page:
<script src="lunr.js"></script> <!-- lunr.js library -->
<script src="lunr.stemmer.support.js"></script>
<script src="lunr.de.js"></script> <!-- or any other language you want -->
then, use the language in when initializing lunr:
var idx = lunr(function () {
// use the language (de)
this.use(lunr.de);
// then, the normal lunr index initialization
this.field('title', { boost: 10 });
this.field('body');
// now you can call this.add(...) to add documents written in German
});
That's it. Just add the documents and you're done. When searching, the language stemmer and stopwords list will be the one you used.
Add require.js
to the page:
<script src="lib/require.js"></script>
then, use the language in when initializing lunr:
require(['lib/lunr.js', '../lunr.stemmer.support.js', '../lunr.de.js'], function(lunr, stemmerSupport, de) {
// since the stemmerSupport and de add keys on the lunr object, we'll pass it as reference to them
// in the end, we will only need lunr.
stemmerSupport(lunr); // adds lunr.stemmerSupport
de(lunr); // adds lunr.de key
// at this point, lunr can be used
var idx = lunr(function () {
// use the language (de)
this.use(lunr.de);
// then, the normal lunr index initialization
this.field('title', { boost: 10 })
this.field('body')
// now you can call this.add(...) to add documents written in German
});
});
var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.de.js')(lunr); // or any other language you want
var idx = lunr(function () {
// use the language (de)
this.use(lunr.de);
// then, the normal lunr index initialization
this.field('title', { boost: 10 })
this.field('body')
// now you can call this.add(...) to add documents written in German
});
If your documents are written in more than one language, you can enable multi-language indexing. This ensures every word is properly trimmed and stemmed, every stopword is removed, and no words are lost (indexing in just one language would remove words from every other one.)
var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);
var idx = lunr(function () {
// the reason "en" does not appear above is that "en" is built in into lunr js
this.use(lunr.multiLanguage('en', 'ru'));
// then, the normal lunr index initialization
// ...
});
You can combine any number of supported languages this way. The corresponding lunr language scripts must be loaded (English is built in).
If you serialize the index and load it in another script, you'll have to initialize the multi-language support in that script, too, like this:
lunr.multiLanguage('en', 'ru');
var idx = lunr.Index.load(serializedIndex);
Check the Contributing section
Searching inside documents is not as straight forward as using indexOf()
, since there are many things to consider in order to get quality search results:
['Hope', 'you', 'like', 'using', 'Lunr', 'Languages!']
Languages!
into Languages
consignment
but we want to search for consigned
? It should find it, since its meaning is the same, only the form is different.the
, it
, so
, etc. These words are called Stop wordsI've created this project by compiling and wrapping stemmers toghether with stop words from various sources (including users contributions) so they can be directly used with all the current versions of Lunr.
I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook)
FAQs
A a collection of languages stemmers and stopwords for Lunr Javascript library
The npm package lunr-languages receives a total of 88,424 weekly downloads. As such, lunr-languages popularity was classified as popular.
We found that lunr-languages demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.