Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

minisearch

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

minisearch

fun with fulltext search

0.1.17
Source
npm

Version published: 6 years ago

Weekly downloads: 254K; increased by2.88%

Maintainers: 1

Weekly downloads

Created: 6 years ago

What is minisearch?

MiniSearch is a lightweight, full-text search engine for JavaScript. It is designed to be simple to use and efficient, making it suitable for client-side applications as well as server-side usage. MiniSearch allows you to index documents and perform search queries on them, providing features like tokenization, stemming, and field-based search.

What are minisearch's main functionalities?

Indexing Documents

This feature allows you to index a collection of documents. You specify which fields to index and which fields to store in the search results. The `addAll` method is used to add multiple documents to the index.

const MiniSearch = require('minisearch')

let miniSearch = new MiniSearch({
  fields: ['title', 'text'], // fields to index for full-text search
  storeFields: ['title'] // fields to return with search results
})

let documents = [
  { id: 1, title: 'Moby Dick', text: 'Call me Ishmael. Some years ago...' },
  { id: 2, title: 'Pride and Prejudice', text: 'It is a truth universally acknowledged...' },
  // more documents...
]

miniSearch.addAll(documents)

Performing Searches

Once documents are indexed, you can perform search queries on them. The `search` method returns a list of documents that match the query, sorted by relevance.

let results = miniSearch.search('Ishmael')
console.log(results)

Customizing Tokenization

MiniSearch allows you to customize the tokenization process. In this example, the `tokenize` function splits the text into tokens based on whitespace.

let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  tokenize: (string, _fieldName) => string.split(/\s+/)
})

Stemming and Stop Words

You can also customize how terms are processed and specify stop words. In this example, terms are converted to lowercase, and common stop words are excluded from the index.

let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  processTerm: (term) => term.toLowerCase(),
  stopWords: new Set(['the', 'is', 'and'])
})

Other packages similar to minisearch

MiniSearch

MiniSearch is a tiny but powerful in-memory fulltext search engine for JavaScript. It is respectful of resources, and it can comfortably run both in Node and in the browser.

Try out the demo application.

Use case

MiniSearch addresses use cases where full-text search features are needed (e.g. prefix search, fuzzy search, boosting of fields), but the data to be indexed can fit locally in the process memory. While you may not index the whole Wikipedia with it, there are surprisingly many use cases that are served well by MiniSearch. By storing the index in local memory, MiniSearch can work offline, and can process queries quickly, without network latency.

A prominent use-case is search-as-you-type features in web and mobile applications, where keeping the index on the client-side enables fast and reactive UI, removing the need to make requests to a search server.

Features

Memory-efficient index, designed to support memory-constrained use cases like mobile browsers.
Exact, prefix, and fuzzy search
Auto-suggestion engine, for auto-completion of search queries
Documents can be added and removed from the index at any time
Simple API, providing building blocks to build specific solutions
Zero external dependencies, small and well tested code-base

Installation

With npm:

npm install --save minisearch

With yarn:

yarn add minisearch

Then require or import it in your project.

Usage

Basic usage

// A collection of documents for our examples
const documents = [
  { id: 1, title: 'Moby Dick', text: 'Call me Ishmael. Some years ago...' },
  { id: 2, title: 'Zen and the Art of Motorcycle Maintenance', text: 'I can see by my watch...' },
  { id: 3, title: 'Neuromancer', text: 'The sky above the port was...' },
  { id: 4, title: 'Zen and the Art of Archery', text: 'At first sight it must seem...' },
  // ...and more
]

let miniSearch = new MiniSearch({ fields: ['title', 'text'] })

// Index all documents
miniSearch.addAll(documents)

// Search with default options
let results = miniSearch.search('zen art motorcycle')
// => [ { id: 2, score: 2.77258, match: { ... } }, { id: 4, score: 1.38629, match: { ... } } ]

Search options

MiniSearch supports several options for more advanced search behavior:

// Search only specific fields
miniSearch.search('zen', { fields: ['title'] })

// Boost some fields (here "title")
miniSearch.search('zen', { boost: { title: 2 } })

// Prefix search (so that 'moto' will match 'motorcycle')
miniSearch.search('moto', { prefix: true })

// Fuzzy search, in this example, with a max edit distance of 0.2 * term length,
// rounded to nearest integer. The mispelled 'ismael' will match 'ishmael'.
miniSearch.search('ismael', { fuzzy: 0.2 })

// You can set the default search options upon initialization
miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  searchOptions: {
    boost: { title: 2 },
    fuzzy: 0.2
  }
})
miniSearch.addAll(documents)

// It will now by default perform fuzzy search and boost "title":
miniSearch.search('zen and motorcycles')

Auto suggestions

MiniSearch can suggest search queries given an incomplete query:

miniSearch.autoSuggest('zen ar')
// => [ { suggestion: 'zen archery art', terms: [ 'zen', 'archery', 'art' ], score: 1.73332 },
//      { suggestion: 'zen art', terms: [ 'zen', 'art' ], score: 1.21313 } ]

The autoSuggest method takes the same options as the search method, so you can get suggestions for misspelled words using fuzzy search:

miniSearch.autoSuggest('neromancer', { fuzzy: 0.2 })
// => [ { suggestion: 'neuromancer', terms: [ 'neuromancer' ], score: 1.03998 } ]

Tokenization

By default, documents and queries are tokenized splitting on non-word characters. No stop-word list is applied, but single-character words are excluded. The tokenization logic can be easily changed by passing a custom tokenizer function as the tokenize option:

let stopWords = new Set(['and', 'or', 'to', 'in', 'a', 'the', /* ...and more */ ])

// Tokenize splitting by space and apply a stop-word list
let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  tokenize: (string) => string.split(/\s+/).filter(word => !stopWords.has(word))
})

Term processing

Terms are downcased by default. No stemming is performed. To customize how the terms are processed upon indexing or searching, for example to normalize them or to apply stemming, the processTerm option can be used:

const removeAccents = (term) =>
  term.replace(/[àá]/, 'a')
      .replace(/[èé]/, 'e')
      .replace(/[ìí]/, 'i')
      .replace(/[òó]/, 'o')
      .replace(/[ùú]/, 'u')

// Perform custom term processing (here removing accents)
let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  processTerm: (term) => removeAccents(term.toLowerCase())
})

Refer to the API documentation for details about configuration options and methods.

Browser compatibility

MiniSearch should natively supports all modern browsers implementing JavaScript standards, but requires a polyfill when used in Internet Explorer, as it makes use of Object.entries, Array.includes and Array.from. The @babel/polyfill is one such polyfill that can be used to provide those functions.

Keywords

FAQs

What is minisearch?

Is minisearch popular?

Is minisearch well maintained?

Package last updated on 24 Oct 2018

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

minisearch

What is minisearch?

What are minisearch's main functionalities?

Other packages similar to minisearch

lunr

elasticlunr

search-index

Use case

Features

Installation

Usage

Basic usage

Search options

Auto suggestions

Tokenization

Term processing

Browser compatibility

Keywords

Related posts

Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets

NVD Backlog Tops 20,000 CVEs Awaiting Analysis as NIST Prepares System Updates

Malicious npm Package Exploits WhatsApp Authentication with Remote Kill Switch for File Destruction