Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
elasticlunr
Advanced tools
Lightweight full-text search engine in Javascript for browser search and offline search.
Elasticlunr.js is a lightweight full-text search engine developed in JavaScript for browser search and offline search. Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time boosting, field search, more rational scoring/ranking methodology, fast computation speed and so on. Elasticlunr.js is a bit like Solr, but much smaller and not as bright, but also provide flexible configuration, query-time boosting, field search and other features.
A very simple search index can be created using the following scripts:
var index = elasticlunr(function () {
this.addField('title');
this.addField('body');
this.setRef('id');
});
Adding documents to the index is as simple as:
var doc1 = {
"id": 1,
"title": "Oracle released its latest database Oracle 12g",
"body": "Yestaday Oracle has released its new database Oracle 12g, this would make more money for this company and lead to a nice profit report of annual year."
}
var doc2 = {
"id": 2,
"title": "Oracle released its profit report of 2015",
"body": "As expected, Oracle released its profit report of 2015, during the good sales of database and hardware, Oracle's profit of 2015 reached 12.5 Billion."
}
index.addDoc(doc1);
index.addDoc(doc2);
Then searching is as simple:
index.search("Oracle database profit");
Also, you could do query-time boosting by passing in a configuration.
index.search("Oracle database profit", {
fields: {
title: {boost: 2},
body: {boost: 1}
}
});
This returns a list of matching documents with a score of how closely they match the search query:
[{
"ref": 1,
"score": 0.5376053707962494
},
{
"ref": 2,
"score": 0.5237481076838757
}]
If user do not want to store the original JSON documents, they could use the following setting:
var index = elasticlunr(function () {
this.addField('title');
this.addField('body');
this.setRef('id');
this.saveDocument(false);
});
Then elasticlunr.js will not store the JSON documents, this will reduce the index size, but also bring some inconvenience such as update a document or delete a document by document id or reference. Actually most of the time user will not udpate or delete a document from index.
API documentation is available, as well as a full working example.
Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time Boosting, Field Search, more rational scoring/ranking methodology, flexible configuration and so on. A bit like Solr, but much smaller and not as bright, but also provide flexible configuration, query-time boosting, field search, etc.
Simply include the elasticlunr.js source file in the page that you want to use it. Elasticlunr.js is supported in all modern browsers.
Browsers that do not support ES5 will require a JavaScript shim for Elasticlunr.js to work. You can either use Augment.js, ES5-Shim or any library that patches old browsers to provide an ES5 compatible JavaScript environment.
This part only contain important apects of elasticlunr.js, for the whole documentation, please go to API documentation.
When you first create a index instance, you need to specify which field you want to index. If you did not specify which field to index, then no field will be searchable for your documents. You could specify fields by:
var index = elasticlunr(function () {
this.addField('title');
this.addField('body');
this.setRef('id');
});
You could also set the document reference by this.setRef('id')
, if you did not set document ref, elasticlunr.js will use 'id' as default.
You could do the above index setup as followings:
var index = elasticlunr();
index.addField('title');
index.addField('body');
index.setRef('id');
Also you could choose not store the original JSON document to reduce the index size by:
var index = elasticlunr();
index.addField('title');
index.addField('body');
index.setRef('id');
index.saveDocument(false);
Add document to index is very simple, just prepare you document in JSON format, then add it to index.
var doc1 = {
"id": 1,
"title": "Oracle released its latest database Oracle 12g",
"body": "Yestaday Oracle has released its new database Oracle 12g, this would make more money for this company and lead to a nice profit report of annual year."
}
var doc2 = {
"id": 2,
"title": "Oracle released its profit report of 2015",
"body": "As expected, Oracle released its profit report of 2015, during the good sales of database and hardware, Oracle's profit of 2015 reached 12.5 Billion."
}
index.addDoc(doc1);
index.addDoc(doc2);
If your JSON document contains field that not configured in index, then that field will not be indexed, which means that field is not searchable.
Elasticlunr.js support remove a document from index, just provide JSON document to elasticlunr.Index.prototype.removeDoc()
function.
For example:
var doc = {
"id": 1,
"title": "Oracle released its latest database Oracle 12g",
"body": "Yestaday Oracle has released its new database Oracle 12g, this would make more money for this company and lead to a nice profit report of annual year."
}
index.removeDoc(doc);
Remove a document will remove each token of that document's each field from field-specified inverted index.
Elasticlunr.js support update a document in index, just provide JSON document to elasticlunr.Index.prototype.update()
function.
For example:
var doc = {
"id": 1,
"title": "Oracle released its latest database Oracle 12g",
"body": "Yestaday Oracle has released its new database Oracle 12g, this would make more money for this company and lead to a nice profit report of annual year."
}
index.update(doc);
Elasticlunr.js provides flexible query configuration, supports query-time boosting and Boolean logic setting. You could setup a configuration tell elasticlunr.js how to do query-time boosting, which field to search in, how to do the boolean logic. Or you could just use it by simply provide a query string, this will aslo works perfectly because the scoring mechanism is very efficient.
Because elasticlunr.js has a very perfect scoring mechanism, so for most of your requirement, simple search would be easy to meet your requirement.
index.search("Oracle database profit");
Output is a results array, each element of results array is an Object contain a ref
field and a score
field.
ref
is the document reference.
score
is the similarity measurement.
Results array is sorted descent by score
.
Setup which fields to search in by passing in a JSON configuration, and setup boosting for each search field. If you setup this configuration, then elasticlunr.js will only search the query string in the specified fields with boosting weight.
The scoring mechanism used in elasticlunr.js is very complex, please goto details for more information.
index.search("Oracle database profit", {
fields: {
title: {boost: 2},
body: {boost: 1}
}
});
Elasticlunr.js also support boolean logic setting, if no boolean logic is setted, elasticlunr.js use "OR" logic defaulty. By "OR" default logic, elasticlunr.js could reach a high Recall.
index.search("Oracle database profit", {
fields: {
title: {boost: 2},
body: {boost: 1}
},
bool: "OR"
});
Boolean model could be setted by global level such as the above setting or it could be setted by field level, if both global and field level contains a "bool" setting, field level setting will overwrite the global setting.
index.search("Oracle database profit", {
fields: {
title: {boost: 2, bool: "AND"},
body: {boost: 1}
},
bool: "OR"
});
The above setting will search title
field by AND model and other fields by "OR" model.
Currently if you search in multiply fields, resutls from each field will be merged together to give the query results. In the future elasticlunr will support configuration that user could set how to combine the results from each field, such as "most_field" or "top_field".
Sometimes user want to expand a query token to increase RECALL, then user could set expand model to true by configuration, default is false. For example, user query token is "micro", and assume "microwave" and "microscope" are in the index, then is user choose expand the query token "micro" to increase RECALL, both "microwave" and "microscope" will be returned and search in the index. The query results from expanded tokens are penalized because they are not exactly the same as the query token.
index.search("micro", {
fields: {
title: {boost: 2, bool: "AND"},
body: {boost: 1}
},
bool: "OR",
expand: true
});
Field level expand configuration will overwrite global expand configuration.
index.search("micro", {
fields: {
title: {
boost: 2,
bool: "AND",
expand: false
},
body: {boost: 1}
},
bool: "OR",
expand: true
});
Elasticlunr.js contains some default stop words of English, such as:
Defaultly elasticlunr.js contains 120 stop words, user could decide not use these default stop words or add customized stop words.
You could remove default stop words simply as:
elasticlunr.clearStopWords();
User could add a list of customized stop words.
var customized_stop_words = ['an', 'hello', 'xyzabc'];
elasticlunr.addStopWords(customized_stop_words);
Elasticlunr support Node.js, you could use elastilunr in node.js as a node-module.
Install elasticlunr by:
npm install elasticlunr
then in your node.js project or in node.js console:
var elasticlunr = require('elasticlunr');
var index = elasticlunr(function () {
this.addField('title')
this.addField('body')
});
var doc1 = {
"id": 1,
"title": "Oracle released its latest database Oracle 12g",
"body": "Yestaday Oracle has released its new database Oracle 12g, this would make more money for this company and lead to a nice profit report of annual year."
}
var doc2 = {
"id": 2,
"title": "Oracle released its profit report of 2015",
"body": "As expected, Oracle released its profit report of 2015, during the good sales of database and hardware, Oracle's profit of 2015 reached 12.5 Billion."
}
index.addDoc(doc1);
index.addDoc(doc2);
index.search("Oracle database profit");
Default supported language of elasticlunr.js is English, if you want to use elasticlunr.js to index other language documents, then you need to use elasticlunr.js combined with lunr-languages.
Suppose you are using elasticlunr.js in browser for other languages, you could download the corresponding language support from lunr-languages, then include the scripts as:
<script src="lunr.stemmer.support.js"></script>
<script src="lunr.de.js"></script>
then, you could use elasticlunr.js as normal:
var index = elasticlunr(function () {
// use the language (de)
this.use(elasticlunr.de);
// then, the normal elasticlunr index initialization
this.addField('title')
this.addField('body')
});
Pay attention to the special code:
this.use(elasticlunr.de);
If you are using other language, such as es(Spanish), download the corresponding lunr.es.js
file and lunr.stemmer.support.js
, and change the above line to:
this.use(elasticlunr.es);
Suppose you are using elasticlunr.js in Node.js for other languages, you could download the corresponding language support from lunr-languages, put the files lunr.es.js
file and lunr.stemmer.support.js
in your project, then in your Node.js module, use elasticlunr.js as:
var elasticlunr = require('elasticlunr');
require('./lunr.stemmer.support.js')(elasticlunr);
require('./lunr.de.js')(elasticlunr);
var index = elasticlunr(function () {
// use the language (de)
this.use(elasticlunr.de);
// then, the normal elasticlunr index initialization
this.addField('title')
this.addField('body')
});
For more details, please go to lunr-languages.
See the CONTRIBUTING.mdown
file.
FAQs
Lightweight full-text search engine in Javascript for browser search and offline search.
We found that elasticlunr demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.