You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

daq-proc

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

daq-proc

Simple document processor to make search running in the browser and node.js a little better. Supports 50+ languages. Removes stopwords (smaller index and less irrelevant hits), extract keywords to filter on and prepares ngrams for auto-complete functional

8.0.0

latest

Source

npm

Version published: 2 years ago

Weekly downloads: 8

Maintainers: 1

Weekly downloads

Created: 6 years ago

Source

daq-proc

Simple document and query processor to makes search running in the browser and node.js a little better. Removes stopwords (smaller index and less irrelevant hits), extract keywords to filter on and prepares ngrams for auto-complete functionality.

Demo

document processor. It showcases the document processor end. Just add some words and figure it out.
query processor. Showcases hit highlighting and truncating text if needed. Possible to turn fuzzy matching on/off.

This library is not creating anything new, but just packaging 6 libraries that goes well togehter into one browser distribution file. Also showing how it may be usefull through tests and the interactive demo.

Libraries that daq-proc is depending on

cheerio - Here specifically used to extract text from all- or parts of some HTML.
eklem-headline-parser - Determines the most relevant keywords in a headline by considering article context
hit-highlighter - Higlighting hits from a query in a result item.
leven-match - Calculating Levenshtein match between words in two arrays within given distance. Good for fuzzy matching.
ngraminator - Generate n-grams.
stopword - Removes stopwords from an array of words. To keep your index small and remove all words without a scent of information and/or remove stopwords from the query, making the search engine work less hard to find relevant results.
words'n'numbers - Extract words and optionally numbers from a string of text into arrays. Arrays that can be fed to stopword, eklem-headline-parser, leven-match, ngraminator and hit-highlighter.

Browser

Example - document processing side

<script src="https://cdn.jsdelivr.net/npm/daq-proc/dist/daq-proc.umd.min.js"></script>

<script>
  // exposing the underlying libraries in a transparent way
  const {
    load,
    removeStopwords, _123, afr, ara, hye, eus, ben, bre, bul, cat, zho, hrv, ces, dan, nld, eng, epo, est, fin, fra, glg, deu, ell, guj, hau, heb, hin, hun, ind, gle, ita, jpn, kor, kur, lat, lav, lit, lgg, lggNd, msa, mar, mya, nob, fas, pol, por, porBr, panGu, ron, rus, slk, slv, som, sot, spa, swa, swe, tha, tgl, tur, urd, ukr, vie, yor, zul,
    extract, words, numbers, emojis, tags, usernames, email,
    ngraminator,
    findKeywords,
    highlight,
    levenMatch
} = dqp

  // input
  const headlineString = 'Document and query processing for the browser!'
  const bodyString = 'Yay! The day is here =) We now have document and query processing for the browser. It is mostly packaging 4 modules together in a browser distribution file. The modules are words-n-numbers, stopword, ngraminator and eklem-headline-parser'

  // extracting word arrays
  let headlineArray = extract(headlineString, {regex: [words, numbers], toLowercase: true})
  let bodyArray = extract(bodyString, {regex: [words, numbers], toLowercase: true})
  console.log('Word arrays: ')
  console.dir(headlineArray)
  console.dir(bodyArray)

  // removing stopwords
  let headlineStopped = removeStopwords(headlineArray)
  let bodyStopped = removeStopwords(bodyArray)
  console.log('Stopword removed arrays: ')
  console.dir(headlineStopped)
  console.dir(bodyStopped)

  // n-grams
  let headlineNgrams = ngraminator(headlineStopped, [2,3,4])
  let bodyNgrams = ngraminator(bodyStopped, [2,3,4])
  console.log('Ngram arrays: ')
  console.dir(headlineNgrams)
  console.dir(bodyNgrams)

  // calculating important keywords
  let keywords = findKeywords(headlineStopped, bodyStopped, 5)
  console.log('Keyword array: ')
  console.dir(keywords)
</script>

Example - Query side

<script src="https://cdn.jsdelivr.net/npm/daq-proc/dist/daq-proc.umd.min.js"></script>

<script>
  // exposing the underlying libraries in a transparent way
  const {
    highlight,
    levenMatch
  } = dqp

  const query = ['interesting', 'words']
  const searchResult = ['some', 'interesting', 'words', 'to', 'remember']

  highlight(query, searchResult)
  // returns:
  // 'some <span class="highlighted">interesting words</span> to remember'

  const index = ['return', 'all', 'word', 'matches', 'between', 'two', 'arrays', 'within', 'given', 'levenshtein', 'distance', 'intended', 'use', 'is', 'to', 'words', 'in', 'a', 'query', 'that', 'has', 'an', 'index', 'good', 'for', 'autocomplete', 'type', 'functionality,', 'and', 'some', 'cases', 'also', 'searching']
  const query = ['qvery', 'words', 'levensthein']

  levenMatch(query, index, {distance: 2})
  // returns:
  //[ [ 'query' ], [ 'word', 'words' ], [ 'levenshtein' ] ]
</script>

Node.js

It's fully possible to use on Node.js too. The tests are both for Node.js and the browser. It's only wrapping 6 libraries for the ease of use in the browser, but could come in handy for i.e. simple crawler scenarios.

Something missing?

Create an issue so we can discuss =).

Keywords

document-processor

nlp

FAQs

What is daq-proc?

Is daq-proc popular?

Is daq-proc well maintained?

Package last updated on 19 Apr 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

daq-proc

daq-proc

Demo

Libraries that daq-proc is depending on

Browser

Example - document processing side

Example - Query side

Node.js

Something missing?

Keywords

Related posts

Introducing License Overlays: Smarter License Management for Real-World Code

Introducing Rust Support in Socket