Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

flexsearch

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

flexsearch

Next-Generation full text search library with zero dependencies.

0.5.1
Source
npm

Version published: 6 years ago

Weekly downloads: 154K; decreased by-38.45%

Maintainers: 1

Weekly downloads

Created: 7 years ago

What is flexsearch?

FlexSearch is a high-performance, full-text search library for JavaScript. It is designed to be fast and efficient, providing a variety of indexing and search capabilities. FlexSearch can be used in both Node.js and browser environments, making it versatile for different types of applications.

What are flexsearch's main functionalities?

Indexing

FlexSearch allows you to create an index and add documents to it. Each document is associated with a unique ID.

const FlexSearch = require('flexsearch');
const index = new FlexSearch();
index.add(1, 'Hello World');
index.add(2, 'Hi there');

Searching

You can perform searches on the indexed documents. The search results will return the IDs of the matching documents.

index.search('Hello', function(results) {
  console.log(results); // [1]
});

Asynchronous Search

FlexSearch supports asynchronous search operations, which can be useful for handling large datasets or integrating with other asynchronous workflows.

index.search('Hello').then(function(results) {
  console.log(results); // [1]
});

Custom Configuration

FlexSearch allows for extensive customization of the indexing and search process. You can configure encoding, tokenization, threshold, and resolution to suit your specific needs.

const customIndex = new FlexSearch({
  encode: 'icase',
  tokenize: 'forward',
  threshold: 0,
  resolution: 3
});
customIndex.add(1, 'Custom configuration example');

Other packages similar to flexsearch

Search Library

Web's fastest and most memory-flexible full-text search library with zero dependencies.

When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching. Depending on the used options it also provides the most memory-efficient index. Keep in mind that updating existing items or removing items from the index has a significant cost. When existing items of your index needs to be updated/removed continuously then BulkSearch may be a better choice. FlexSearch also provides you a non-blocking asynchronous processing model as well as web workers to perform any updates or queries on the index in parallel through dedicated balanced threads.

FlexSearch Server is available here: https://github.com/nextapps-de/flexsearch-server.

Installation Guide • API Reference • Custom Builds • Flexsearch Server • Changelog

Supported Platforms:

Browser
Node.js

Demos:

Auto-Complete

Library Comparison:

Get Latest (Stable Release):

Build	File	CDN
flexsearch.min.js	Download	https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@master/dist/flexsearch.min.js
flexsearch.light.js	Download	https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@master/dist/flexsearch.light.js
flexsearch.compact.js	Download	https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@master/dist/flexsearch.compact.js
flexsearch.custom.js	Custom Build

All Features:

Feature	flexsearch.min.js	flexsearch.compact.js	flexsearch.light.js
Presets	✔	✔	-
Async Search	✔	✔	-
Web-Worker Sharding (not available in Node.js)	✔	-	-
Contextual Indexes	✔	✔	✔
Index Documents (Field-Search)	✔	✔	-
Where / Find / Tags	✔	-	-
Partial Matching	✔	✔	✔
Multi-Phrase Search	✔	✔	✔
Relevance-based Scoring	✔	✔	✔
Auto-Balanced Cache by Popularity	✔	-	-
Suggestions (Results)	✔	-	-
Phonetic Matching	✔	✔	-
Customizable: Matcher, Encoder, Tokenizer, Stemmer, Filter	✔	✔	✔
File Size (gzip)	5.0 kb	3.9 kb	2.7 kb

It is also pretty simple to make Custom Builds

Benchmark Ranking

Comparison: Benchmark "Gulliver's Travels"

Query Test: "Gulliver's Travels"

Rank	Library Name	Library Version	Single Phrase (op/s)	Multi Phrase (op/s)	Not Found (op/s)
1	FlexSearch ***	0.3.6	363757	182603	1627219
2	Wade	0.3.3	899	6098	214286
3	JS Search	1.4.2	735	8889	800000
4	JSii	1.0	551	9970	75000
5	Lunr.js	2.3.5	355	1051	25000
6	Elasticlunr.js	0.9.6	327	781	6667
7	BulkSearch	0.1.3	265	535	2778
8	bm25	0.2	71	116	2065
9	Fuse	3.3.0	0.5	0.4	0.7

Memory Test: "Gulliver's Travels"

Rank	Library Name	Library Version	Index Size *	Memory Allocation **
1	FlexSearch ****	0.3.1	1.33 Mb	20.31 kb
2	Wade	0.3.3	3.18 Mb	68.53 kb
3	Fuse	3.3.0	0.22 Mb	156.46 kb
4	JSii	1.0	8.9 Mb	81.03 kb
5	bm25	0.2	6.95 Mb	137.88 kb
6	BulkSearch	0.1.3	1.53 Mb	984.30 kb
7	Elasticlunr.js	0.9.6	11.83 Mb	68.69 kb
8	Lunr.js	2.3.5	16.24 Mb	84.73 kb
9	JS Search	1.4.2	36.9 Mb	53.0 kb

* Index Size: The size of memory the index requires
** Memory Allocation: The amount of memory which was additionally allocated during a row of 10 queries
*** The preset "fast" was used for this test
**** The preset "memory" was used for this test

Library Comparison: Benchmark "Gulliver's Travels"

Contextual Search

Note: This feature is actually not enabled by default. Read here how to enable.

FlexSearch introduce a new scoring mechanism called Contextual Search which was invented by Thomas Wilkerling, the author of this library. A Contextual Search incredibly boost up queries to a complete new level but also requires some additionally memory (depending on depth). The basic idea of this concept is to limit relevance by its context instead of calculating relevance through the whole (unlimited) distance. In this way contextual search also improves the results of relevance-based queries on a large amount of text data.

"TF-IDF and all kinds of variations (like BM25) is a big mistake in searching algorithms today. They don't provide neither: a meaningful relevance of a term nor the importance of it! Like many pseudo-intelligent algorithms this is also just an example of mathematical stupidity." — Thomas Wilkerling, Contextual-based Scoring, 2018

Model of context-based scoring

Lexical Pre-Scored Dictionary / Context-based Map

The index consists of an in-memory pre-scored dictionary as its base. The biggest complexity of these algorithm occurs during the calculation of intersections. As a consequence each additional term in the query has a significant potential to increase complexity. A contextual map comes into play when the query contains more than 1 term and increase effect for each additional term by cutting down the complexity for the intersection calculations. Instead of an increase, the complexity is lowered for each additional term. The contextual index itself is also based on a pre-scored dictionary and follows a memory-friendly strategy.

Type	Complexity
Each single term query:	1
Lexical Pre-Scored Dictionary (Solo):	TERM_COUNT TERM_MATCHES*
Lexical Pre-Scored Dictionary + Context-based Map:	TERM_MATCHES / TERM_COUNT

The complexity for one single term is always 1.

Compare BulkSearch vs. FlexSearch

	BulkSearch	FlexSearch
Access	Read-Write optimized index	Read-Memory optimized index
Memory	Large: ~ 1 Mb per 100,000 words	Tiny: ~ 100 Kb per 100,000 words
Index Type	Bulk of encoded string data divided into chunks	Lexical pre-scored dictionary Context-based map
Strength	fast adds fast updates fast removals	fast queries memory-efficient index
Weaks	less powerful contextual search less memory efficient (has to be defragmented from time to time)	updating existing / deleting items from index is slow adding items to the index optimized for partial matching (tokenize: "forward" / "reverse" / "full") is slow
Pagination	Yes	No
Wildcards	Yes	No

Installation

HTML / Javascript

Use flexsearch.min.js for production and flexsearch.js for development.

<html>
<head>
    <script src="js/flexsearch.min.js"></script>
</head>
...

Use latest from CDN:

<script src="https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@master/dist/flexsearch.min.js"></script>

Or a specific version:

<script src="https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.3.51/dist/flexsearch.min.js"></script>

AMD:

var FlexSearch = require("./flexsearch.js");

Node.js

npm install flexsearch

In your code include as follows:

var FlexSearch = require("flexsearch");

Or pass in options when requiring:

var index = require("flexsearch").create({/* options */});

API Overview

Global methods:

Index methods:

Usage

Create a new index

FlexSearch.create(<options>)

var index = new FlexSearch();

alternatively you can also use:

var index = FlexSearch.create();

Create a new index and choosing one of the presets:

var index = new FlexSearch("speed");

Create a new index with custom options:

var index = new FlexSearch({

    // default values:

    encode: "balance",
    tokenize: "forward",
    threshold: 0,
    async: false,
    worker: false,
    cache: false
});

Create a new index and extend a preset with custom options:

var index = new FlexSearch("memory", {
    encode: "balance",
    tokenize: "forward",
    threshold: 0
});

See all available custom options.

Add text item to an index

Index.add(id, string)

index.add(10025, "John Doe");

Search items

Index.search(string | options, <limit>, <callback>)

index.search("John");

Limit the result:

index.search("John", 10);

Async Search

Perform queries asynchronously:

index.search("John", function(result){
    
    // array of results
});

Passing a callback always will perform as asynchronous even if the "async" option was not set.

Perform queries asynchronously (Promise-based):

Make sure the option "async" is enabled on this instance to receive promises.

index.search("John").then(function(result){
    
    // array of results
});

Alternatively ES6:

async function search(query){

    const result = await index.search(query);
    
    // ...
}

Custom Search

Pass custom options for each query:

index.search({

    query: "John",
    limit: 1000,
    threshold: 5, // >= threshold
    depth: 3,     // <= depth
    callback: function(results){
        // ...
    }
});

The same from above could also be written as:

index.search("John", {

    limit: 1000,
    threshold: 5,
    depth: 3
    
}, function(results){
    
    // ....
});

Suggestions

Get also suggestions for a query:

index.search({
    
    query: "John Doe",
    suggest: true
});

When suggestion is enabled all results will be filled up (until limit, default 1000) with similar matches ordered by relevance.

Actually phonetic suggestions are not supported, for that purpose use the encoder and tokenizer which provides similar functionality. Suggestions comes into game when a query has multiple words/phrases. Assume a query contains 3 words. When the index just match 2 of 3 words then normally you will get no results, but with suggestion enabled you will also get results when 2 of 3 words was matched as well 1 of 3 words was matched (depends on the limit), also sorted by relevance.

Note: Is is planned to improve this feature and providing more flexibility.

Update item from an index

Index.update(id, string)

index.update(10025, "Road Runner");

Remove item from an index

Index.remove(id)

index.remove(10025);

Reset index

index.clear();

Destroy the index

index.destroy();

Re-Initialize the index

Index.init(<options>)

Initialize (with same options):

index.init();

Initialize with new options:

index.init({

    /* options */
});

Re-initialization will also destroy the old index.

Add custom matcher

FlexSearch.registerMatcher({REGEX: REPLACE})

Add global matchers for all instances:

FlexSearch.registerMatcher({

    'ä': 'a', // replaces all 'ä' to 'a'
    'ó': 'o',
    '[ûúù]': 'u' // replaces multiple
});

Add private matchers for a specific instance:

index.addMatcher({

    'ä': 'a', // replaces all 'ä' to 'a'
    'ó': 'o',
    '[ûúù]': 'u' // replaces multiple
});

Add custom encoder

Assign a custom encoder by passing a function during index creation/initialization:

var index = new FlexSearch({

    encode: function(str){
    
        // do something with str ...
        
        return str;
    }
});

The encoder function gets a string as a parameter and has to return the modified string.

Call a custom encoder directly:

var encoded = index.encode("sample text");

Register a global encoder

FlexSearch.registerEncoder(name, encoder)

Global encoders can be shared/used by all instances.

FlexSearch.registerEncoder("whitespace", function(str){

    return str.replace(/\s/g, "");
});

Initialize index and assign a global encoder:

var index = new FlexSearch({ encode: "whitespace" });

Call a global encoder directly:

var encoded = FlexSearch.encode("whitespace", "sample text");

Mix/Extend multiple encoders

FlexSearch.registerEncoder('mixed', function(str){
  
    str = this.encode("icase", str); // built-in
    str = this.encode("whitespace", str); // custom
    
     // do something additional with str ...
    
    return str;
});

Add custom tokenizer

A tokenizer split words into components or chunks.

Define a private custom tokenizer during creation/initialization:

var index = new FlexSearch({

    tokenize: function(str){

        return str.split(/\s-\//g);
    }
});

The tokenizer function gets a string as a parameter and has to return an array of strings (parts).

Add language-specific stemmer and/or filter

Stemmer: several linguistic mutations of the same word (e.g. "run" and "running")

Filter: a blacklist of words to be filtered out from indexing at all (e.g. "and", "to" or "be")

Assign a private custom stemmer or filter during creation/initialization:

var index = new FlexSearch({

    stemmer: {
        
        // object {key: replacement}
        "ational": "ate",
        "tional": "tion",
        "enci": "ence",
        "ing": ""
    },
    filter: [ 
        
        // array blacklist
        "in",
        "into",
        "is",
        "isn't",
        "it",
        "it's"
    ]
});

Or assign stemmer/filters globally to a language:

Stemmer are passed as a object (key-value-pair), filter as an array.

FlexSearch.registerLanguage("us", {

    stemmer: { /* ... */ },
    filter:  [ /* ... */ ]
});

Or use some pre-defined stemmer or filter of your preferred languages:

<html>
<head>
    <script src="js/flexsearch.min.js"></script>
    <script src="js/lang/en.min.js"></script>
    <script src="js/lang/de.min.js"></script>
</head>
...

Now you can assign built-in stemmer during creation/initialization:

var index_en = new FlexSearch({
    stemmer: "en", 
    filter: "en" 
});

var index_de = new FlexSearch({
    stemmer: "de",
    filter: [ /* custom */ ]
});

In Node.js you just have to require the language pack files to make them available:

require("flexsearch.js");
require("lang/en.js");
require("lang/de.js");

It is also possible to compile language packs into the build as follows:

node compile SUPPORT_LANG_EN=true SUPPORT_LANG_DE=true

Right-To-Left Support

Set the tokenizer at least to "reverse" or "full" when using RTL.

Just set the field "rtl" to true and use a compatible tokenizer:

var index = FlexSearch.create({
    encode: "icase",
    tokenize: "reverse", 
    rtl: true
});

CJK Word Break (Chinese, Japanese, Korean)

Set a custom tokenizer which fits your needs, e.g.:

var index = FlexSearch.create({
    encode: false,
    tokenize: function(str){
        return str.replace(/[\x00-\x7F]/g, "").split("");
    }
});

You can also pass a custom encoder function to apply some linguistic transformations.

index.add(0, "一个单词");

var results = index.search("单词");

Get info about an index

This feature is available in DEBUG mode.

index.info();

Returns information e.g.:

{
    "id": 0,
    "memory": 10000,
    "items": 500,
    "sequences": 3000,
    "matchers": 0,
    "chars": 3500,
    "cache": false,
    "matcher": 0,
    "worker": false,
    "threshold": 7,
    "depth": 3,
    "contextual": true                                 
}

Index Documents (Field-Search)

The Document Descriptor

Assume the document is an array of data like:

var docs = [{
    id: 0,
    title: "Title",
    cat: "Category",
    content: "Body"
},{
    id: 1,
    title: "Title",
    cat: "Category",
    content: "Body"
}];

Provide a document descriptor doc when initializing a new index, e.g. related to the example above:

var index = new FlexSearch({
    tokenize: "strict",
    depth: 3,
    doc: {
        id: "id",
        field: "content"
    }
});

The above example will just index the field "content", to index multiple fields pass an array:

var index = new FlexSearch({
    doc: {
        id: "id",
        field: [
            "title",
            "cat",
            "content"
        ]
    }
});

Complex Objects

Assume the document array looks more complex (has nested branches etc.), e.g.:

var docs = [{
    data:{
        id: 0,
        title: "Foo",
        body: {
            content: "Foobar"
        }
    }
},{
    data:{
        id: 1,
        title: "Bar",
        body: {
            content: "Foobar"
        }
    }
}];

Then use the colon separated notation "root:child:child" to define hierarchy within the document descriptor:

var index = new FlexSearch({
    doc: {
        id: "data:id",
        field: [
            "data:title",
            "data:body:content"
        ]
    }
});

Add/Update/Remove Documents to/from the Index

Just pass the document array (or a single object) to the index:

index.add(docs);

Update index with a single object or an array of objects:

index.update({
    data:{
        id: 0,
        title: "Foo",
        body: {
            content: "Bar"
        }
    }
});

Remove a single object or an array of objects from the index:

index.remove(docs);

When the id is known, you can also simply remove by (faster):

index.remove(id);

Field-Search

The search gives you several options when using documents.

The colon notation also has to be applied for the searching respectively.

This will search through all indexed fields:

var results = index.search("body");

This will search on a specific field):

var results = index.search({
    field: "title",
    query: "foobar"
});

var results = index.search({
    field: "data:body:content",
    query: "foobar"
});

This could also be written as:

var results = index.search("foobar", {
    field: "data:body:content",
});

Search the same query on multiple fields:

var results = index.search({
    query: "foobar",
    field: ["title", "body"]
});

Could also be written as:

var results = index.search("title", {
    field: ["title", "body"]
});

Search different queries on multiple fields:

var results = index.search([{
    field: "title",
    query: "foo"
},{
    field: "body",
    query: "bar"
}]);

Boost scoring on specific fields:

var results = index.search([{
    field: "title",
    query: "foo",
    boost: 2
},{
    field: "body",
    query: "bar",
    boost: 0.5
}]);

Find / Where

When indexing documents, you are also able to get results by specific attributes.

The colon notation also has to be applied for using "where" and "find" respectively.

Find a document by an attribute

index.find("cat", "comedy");

Same as:

index.find({"cat": "comedy"});

To get by ID, you can also use short form:

index.find(1);

Find by a custom function:

index.find(function(item){
    return item.cat === "comedy";
});

Find documents by multiple attributes

Get just the first result:

index.find({
    cat: "comedy", 
    year: "2018"
});

Get all matched results:

index.where({
    cat: "comedy",
    year: "2018"
});

Get all results and set a limit:

index.where({
    cat: "comedy",
    year: "2018"
 }, 100);

Get all by a custom function:

index.where(function(item){
    return item.cat === "comedy";
});

Combine fuzzy search with a where-clause

Add some content, e.g.:

index.add([{
    id: 0,
    title: "Foobar",
    cat: "adventure",
    content: "Body"
},{
    id: 1,
    title: "Title",
    cat: "comedy",
    content: "Foobar"
}]);

Using search and also apply a where-clause:

index.search("foo", {
    field: [
        "title", 
        "body"
    ],
    where: {
        "cat": "comedy"
    },
    limit: 10
});

Custom Sort

The colon notation also has to be applied for a custom sort respectively.

Sort by an attribute:

var results = index.search("John", {

    limit: 100,
    sort: "data:title"
});

Sort by a custom function:

var results = index.search("John", {

    limit: 100,
    sort: function(a, b){
        return (
            a.id < b.id ? -1 : (
                a.id > b.id ? 1 : 0
        ));
    }
});

Chaining

Simply chain methods like:

var index = FlexSearch.create()
                      .addMatcher({'â': 'a'})
                      .add(0, 'foo')
                      .add(1, 'bar');

index.remove(0).update(1, 'foo').add(2, 'foobar');

Enable Contextual Scoring

Create an index and just set the limit of relevance as "depth":

var index = new FlexSearch({

    encode: "icase",
    tokenize: "strict",
    threshold: 7,
    depth: 3
});

Only the tokenizer "strict" is actually supported by the contextual index.

The contextual index requires additional amount of memory depending on depth.

Try to use the lowest depth and highest threshold which fits your needs.

It is possible to modify values for threshold and depth during search (see custom search). The restriction is that the threshold can only be raised, on the other hand the depth can only be lowered.

Enable Auto-Balanced Cache

Create index and just set a limit of cache entries:

var index = new FlexSearch({

    profile: "score",
    cache: 10000
});

When passing a number as a limit the cache automatically balance stored entries related to their popularity.

When just using "true" the cache is unbounded and perform actually 2-3 times faster (because the balancer do not have to run).

WebWorker Sharding (Browser only)

Worker get its own dedicated memory and also run in their own dedicated thread without blocking the UI while processing. Especially for larger indexes, web worker improves speed and available memory a lot. FlexSearch index was tested with a 250 Mb text file including 10 Million words.

When the index isn't big enough it is faster to use no web worker.

Create index and just set the count of parallel threads:

var index = new FlexSearch({

    encode: "icase",
    tokenize: "full",
    async: true,
    worker: 4
});

Adding items to worker index as usual (async enabled):

index.add(10025, "John Doe");

Perform search and simply pass in callback like:

index.search("John Doe", function(results){

    // do something with array of results
});

Or use promises accordingly:

index.search("John Doe").then(function(results){

    // do something with array of results
});

Options

FlexSearch ist highly customizable. Make use of the the right options can really improve your results as well as memory economy or query time.

Option	Values	Description
profile	"memory" "speed" "match" "score" "balance" "fast"	The configuration profile. Choose your preferation.
tokenize	"strict" "forward" "reverse" "full" function()	The indexing mode (tokenizer). Choose one of the built-ins or pass a custom tokenizer function.
encode	false "icase" "simple" "advanced" "extra" "balance" function()	The encoding type. Choose one of the built-ins or pass a custom encoding function.
cache	false true {number}	Enable/Disable and/or set capacity of cached entries. When passing a number as a limit the cache automatically balance stored entries related to their popularity. Note: When just using "true" the cache has no limits and is actually 2-3 times faster (because the balancer do not have to run).
async	true false	Enable/Disable asynchronous processing. Each job will be queued for non-blocking processing. Recommended when using WebWorkers.
worker	false {number}	Enable/Disable and set count of running worker threads.
depth	false {number}	Enable/Disable contextual indexing and also sets contextual distance of relevance.
threshold	false {number}	Enable/Disable the threshold of minimum relevance all results should have. Note: It is also possible to set a lower threshold for indexing and pass a higher value when calling index.search(options).
resolution	{number}	Sets the scoring resolution (default: 9).
stemmer	false {string} {function}	Disable or pass in language shorthand flag (ISO-3166) or a custom object.
filter	false {string} {function}	Disable or pass in language shorthand flag (ISO-3166) or a custom array.
rtl	true false	Enables Right-To-Left encoding.

Tokenizer

Tokenizer effects the required memory also as query time and flexibility of partial matches. Try to choose the most upper of these tokenizer which fits your needs:

Option	Description	Example	Memory Factor (n = length of word)
"strict"	index whole words	foobar	* 1
"forward"	incrementally index words in forward direction	foobar foobar	* n
"reverse"	incrementally index words in both directions	foobar foobar	* 2n - 1
"full"	index every possible combination	foobar foobar	* n * (n - 1)

Phonetic Encoding

Encoding effects the required memory also as query time and phonetic matches. Try to choose the most upper of these encoders which fits your needs, or pass in a custom encoder:

Option	Description	False-Positives	Compression
false	Turn off encoding	no	no
"icase" (default)	Case in-sensitive encoding	no	no
"simple"	Phonetic normalizations	no	~ 7%
"advanced"	Phonetic normalizations + Literal transformations	no	~ 35%
"extra"	Phonetic normalizations + Soundex transformations	yes	~ 60%
function()	Pass custom encoding: function(string):string

Comparison (Matching)

Reference String: "Björn-Phillipp Mayer"

Query	icase	simple	advanced	extra
björn	yes	yes	yes	yes
björ	yes	yes	yes	yes
bjorn	no	yes	yes	yes
bjoern	no	no	yes	yes
philipp	no	no	yes	yes
filip	no	no	yes	yes
björnphillip	no	yes	yes	yes
meier	no	no	yes	yes
björn meier	no	no	yes	yes
meier fhilip	no	no	yes	yes
byorn mair	no	no	no	yes
(false positives)	no	no	no	yes

Memory Usage

The required memory for the index depends on several options:

Encoding	Memory usage of every ~ 100,000 indexed word
false	260 kb
"icase" (default)	210 kb
"simple"	190 kb
"advanced"	150 kb
"extra"	90 kb
Mode	Multiplied with: (n = average length of indexed words)
"strict"	* 1
"forward"	* n
"reverse"	* 2n - 1
"full"	* n * (n - 1)
Contextual Index	Multiply the sum above with:
	* (depth * 2 + 1)

Adding, removing or updating existing items has a similar complexity.

Compare Memory Consumption

The book "Gulliver's Travels" (Swift Jonathan 1726) was used for this test.

Presets

You can pass a preset during creation/initialization. They represents these following settings:

"default": Standard profile

{
    encode: "icase",
    tokenize: "forward",
    resolution: 9
}

"memory": Memory-optimized profile

{
    encode: "extra",
    tokenize: "strict",
    threshold: 0,
    resolution: 1
}

"speed": Speed-optimized profile

{
    encode: "icase",
    tokenize: "strict",
    threshold: 1,
    resolution: 3,
    depth: 2
}

"match": Matching-tolerant profile

{
    encode: "extra",
    tokenize: "full",
    threshold: 1,
    resolution: 3
}

"score": Relevance-optimized profile

{
    encode: "extra",
    tokenize: "strict",
    threshold: 1,
    resolution: 9,
    depth: 4
}

"balance": Most-balanced profile

{
    encode: "balance",
    tokenize: "strict",
    threshold: 0,
    resolution: 3,
    depth: 3
}

"fast": Absolute fastest profile

{
    encode: "icase",
    threshold: 8,
    resolution: 9,
    depth: 1
}

Compare these presets:

Performance Guide

Methods to retrieve results sorted from fastest to slowest:

index.find(id) -> doc
index.where({field: string}) -> Arrary<doc> with a tag on the same field
index.search(query) -> Arrary<id> when just adding id and content to the index (no documents)
index.search(query) -> Arrary<doc> when using documents
index.search(query, { where }) -> Arrary<doc> when using documents and a where clause
index.where({field: [string, string]}) -> Arrary<doc> when a tag was set to one of two fields
index.where({field: string}) -> Arrary<doc> when no tag was set to this field

Methods to change index from fastest to slowest:

index.add(id, string)
index.add(docs)
index.delete(id, string)
index.delete(docs)
index.update(id, string)
index.update(docs)

Performance Checklist:

Using just id-content-pairs for the index performs almost faster than using docs
An additional where-clause in index.search() has a significant cost
When adding multiple fields of documents to the index try to set the lowest possible preset for each field
Make sure the auto-balanced cache is enabled and has a meaningful value
Using index.where() to find documents is very slow when not using a tagged field
Getting a document by ID via index.find(id) is extremely fast
Do not enable async as well as worker when the index does not claim it
Use numeric IDs (the datatype length of IDs influences the memory consumption significantly)
Verify if you can activate contextual index by setting the depth to a minimum meaningful value and tokenizer to "strict"
Pass a limit when searching (lower values performs better)
Pass a minimum threshold when searching (higher values performs better)

Best Practices

Split Complexity

Whenever you can, try to divide content by categories and add them to its own index, e.g.:

var action = new FlexSearch();
var adventure = new FlexSearch();
var comedy = new FlexSearch();

This way you can also provide different settings for each category. This is actually the fastest way to perform a fuzzy search.

To make this workaround more extendable you can use a short helper:

var index = {};

function add(id, cat, content){
    (index[cat] || (
        index[cat] = new FlexSearch
    )).add(id, content);
}

function search(cat, query){
    return index[cat] ? 
        index[cat].search(query) : [];
}

Add content to the index:

add(1, "action", "Movie Title");
add(2, "adventure", "Movie Title");
add(3, "comedy", "Movie Title");

Perform queries:

var results = search("action", "movie title"); // --> [1]

Split indexes by categories improves performance significantly.

Use numeric IDs

It is recommended to use numeric id values as reference when adding content to the index. The byte length of passed ids influences the memory consumption significantly. If this is not possible you should consider to use a index table and map the ids with indexes, this becomes important especially when using contextual indexes on a large amount of content.

Export/Import Index

index.export() returns a serialized dump as a string.

index.import(string) takes a serialized dump as a string and load it to the index.

Assuming you have one or several indexes:

var feeds_2017 = new FlexSearch();
var feeds_2018 = new FlexSearch();
var feeds_2019 = new FlexSearch();

Export indexes, e.g. to the local storage:

localStorage.setItem("feeds_2017", feeds_2017.export());
localStorage.setItem("feeds_2018", feeds_2018.export());
localStorage.setItem("feeds_2019", feeds_2019.export());

Import indexes, e.g. from the local storage:

feeds_2017.import(localStorage.getItem("feeds_2017"));
feeds_2018.import(localStorage.getItem("feeds_2018"));
feeds_2019.import(localStorage.getItem("feeds_2019"));

Debug

Do not use DEBUG in production builds.

If you get issues, you can temporary set the DEBUG flag to true on top of flexsearch.js:

DEBUG = true;

This enables console logging of several processes. Just open the browsers console to make this information visible.

Profiler Stats

Do not use PROFILER in production builds.

To collect some performance statistics of your indexes you need to temporary set the PROFILER flag to true on top of flexsearch.js:

PROFILER = true;

This enables profiling of several processes.

An array of all profiles is available on:

window.stats;

You can also just open the browsers console and enter this line to get stats.

The index of the array corresponds to the index.id.

Get stats from a specific index:

index.stats;

The returning stats payload is divided into several categories. Each of these category provides its own statistic values.

Profiler Stats Properties

Property	Description
time	The sum of time (ms) the process takes (lower is better)
count	How often the process was called
ops	Average operations per seconds (higher is better)
nano	Average cost (ns) per operation/call (lower is better)

Custom Builds

Full Build:

npm run build

Compact Build:

npm run build-compact

Light Build:

npm run build-light

Build Language Packs:

npm run build-lang

Custom Build:

npm run build-custom SUPPORT_WORKER=true SUPPORT_ASYNC=true

Alternatively you can also use:

node compile SUPPORT_WORKER=true

The custom build will be saved to flexsearch.custom.xxxxx.js (the "xxxxx" is a hash based on the used build flags).

Supported Build Flags

Flag	Values
DEBUG	true, false
PROFILER	true, false
SUPPORT_ENCODER (built-in encoders)	true, false
SUPPORT_DOCUMENTS	true, false
SUPPORT_WHERE	true, false
SUPPORT_WORKER	true, false
SUPPORT_CACHE	true, false
SUPPORT_ASYNC	true, false
SUPPORT_PRESETS	true, false
SUPPORT_SERIALIZE	true, false
Language Flags (includes stemmer and filter)
SUPPORT_LANG_EN	true, false
SUPPORT_LANG_DE	true, false
Compiler Flags
LANGUAGE_OUT	ECMASCRIPT3 ECMASCRIPT5 ECMASCRIPT5_STRICT ECMASCRIPT6 ECMASCRIPT6_STRICT ECMASCRIPT_2015 ECMASCRIPT_2017 STABLE

Changelog

Keywords

FAQs

What is flexsearch?

Is flexsearch popular?

Is flexsearch well maintained?

Package last updated on 12 Feb 2019

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

flexsearch

What is flexsearch?

What are flexsearch's main functionalities?

Other packages similar to flexsearch

lunr

elasticlunr

search-index

Web's fastest and most memory-flexible full-text search library with zero dependencies.

Benchmark Ranking

Query Test: "Gulliver's Travels"

Memory Test: "Gulliver's Travels"

Contextual Search

Model of context-based scoring

Lexical Pre-Scored Dictionary / Context-based Map

Compare BulkSearch vs. FlexSearch

Installation

HTML / Javascript

Node.js

API Overview

Usage

Create a new index

Add text item to an index

Search items

Async Search

Custom Search

Suggestions

Update item from an index

Remove item from an index

Reset index

Destroy the index

Re-Initialize the index

Add custom matcher

Add custom encoder

Register a global encoder

Mix/Extend multiple encoders

Add custom tokenizer

Add language-specific stemmer and/or filter

Right-To-Left Support

CJK Word Break (Chinese, Japanese, Korean)

Get info about an index

Index Documents (Field-Search)

The Document Descriptor

Complex Objects

Add/Update/Remove Documents to/from the Index

Field-Search

Find / Where

Find a document by an attribute

Find documents by multiple attributes

Combine fuzzy search with a where-clause

Tags

Custom Sort

Chaining

Enable Contextual Scoring

Enable Auto-Balanced Cache

WebWorker Sharding (Browser only)

Options

Tokenizer

Phonetic Encoding

Comparison (Matching)

Memory Usage

Compare Memory Consumption

Presets

Performance Guide

Best Practices

Split Complexity

Use numeric IDs

Export/Import Index

Debug

Profiler Stats

Profiler Stats Properties

Custom Builds

Supported Build Flags

v0.5.1

Keywords

Related posts

Weaponizing OAST: How Malicious Packages Exploit npm, PyPI, and RubyGems for Data Exfiltration and Recon

Malicious npm Campaign Targets Ethereum Developers with Fake Hardhat Packages

Quasar RAT Disguised as an npm Package for Detecting Vulnerabilities in Ethereum Smart Contracts