Research
Security News
Quasar RAT Disguised as an npm Package for Detecting Vulnerabilities in Ethereum Smart Contracts
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
Simple document processor to make search running in the browser and node.js a little better. Supports 50+ languages. Removes stopwords (smaller index and less irrelevant hits), extract keywords to filter on and prepares ngrams for auto-complete functional
Simple document and query processor to makes search running in the browser and node.js a little better. Removes stopwords (smaller index and less irrelevant hits), extract keywords to filter on and prepares ngrams for auto-complete functionality.
This library is not creating anything new, but just packaging 6 libraries that goes well togehter into one browser distribution file. Also showing how it may be usefull through tests and the interactive demo.
cheerio
- Here specifically used to extract text from all- or parts of some HTML.eklem-headline-parser
- Determines the most relevant keywords in a headline by considering article contexthit-highlighter
- Higlighting hits from a query in a result item.leven-match
- Calculating Levenshtein match between words in two arrays within given distance. Good for fuzzy matching.ngraminator
- Generate n-grams.stopword
- Removes stopwords from an array of words. To keep your index small and remove all words without a scent of information and/or remove stopwords from the query, making the search engine work less hard to find relevant results.words'n'numbers
- Extract words and optionally numbers from a string of text into arrays. Arrays that can be fed to stopword
, eklem-headline-parser
, leven-match
, ngraminator
and hit-highlighter
.<script src="https://cdn.jsdelivr.net/npm/daq-proc/dist/daq-proc.umd.min.js"></script>
<script>
// exposing the underlying libraries in a transparent way
const {
load,
removeStopwords, _123, afr, ara, hye, eus, ben, bre, bul, cat, zho, hrv, ces, dan, nld, eng, epo, est, fin, fra, glg, deu, ell, guj, hau, heb, hin, hun, ind, gle, ita, jpn, kor, kur, lat, lav, lit, lgg, lggNd, msa, mar, mya, nob, fas, pol, por, porBr, panGu, ron, rus, slk, slv, som, sot, spa, swa, swe, tha, tgl, tur, urd, ukr, vie, yor, zul,
extract, words, numbers, emojis, tags, usernames, email,
ngraminator,
findKeywords,
highlight,
levenMatch
} = dqp
// input
const headlineString = 'Document and query processing for the browser!'
const bodyString = 'Yay! The day is here =) We now have document and query processing for the browser. It is mostly packaging 4 modules together in a browser distribution file. The modules are words-n-numbers, stopword, ngraminator and eklem-headline-parser'
// extracting word arrays
let headlineArray = extract(headlineString, {regex: [words, numbers], toLowercase: true})
let bodyArray = extract(bodyString, {regex: [words, numbers], toLowercase: true})
console.log('Word arrays: ')
console.dir(headlineArray)
console.dir(bodyArray)
// removing stopwords
let headlineStopped = removeStopwords(headlineArray)
let bodyStopped = removeStopwords(bodyArray)
console.log('Stopword removed arrays: ')
console.dir(headlineStopped)
console.dir(bodyStopped)
// n-grams
let headlineNgrams = ngraminator(headlineStopped, [2,3,4])
let bodyNgrams = ngraminator(bodyStopped, [2,3,4])
console.log('Ngram arrays: ')
console.dir(headlineNgrams)
console.dir(bodyNgrams)
// calculating important keywords
let keywords = findKeywords(headlineStopped, bodyStopped, 5)
console.log('Keyword array: ')
console.dir(keywords)
</script>
<script src="https://cdn.jsdelivr.net/npm/daq-proc/dist/daq-proc.umd.min.js"></script>
<script>
// exposing the underlying libraries in a transparent way
const {
highlight,
levenMatch
} = dqp
const query = ['interesting', 'words']
const searchResult = ['some', 'interesting', 'words', 'to', 'remember']
highlight(query, searchResult)
// returns:
// 'some <span class="highlighted">interesting words</span> to remember'
const index = ['return', 'all', 'word', 'matches', 'between', 'two', 'arrays', 'within', 'given', 'levenshtein', 'distance', 'intended', 'use', 'is', 'to', 'words', 'in', 'a', 'query', 'that', 'has', 'an', 'index', 'good', 'for', 'autocomplete', 'type', 'functionality,', 'and', 'some', 'cases', 'also', 'searching']
const query = ['qvery', 'words', 'levensthein']
levenMatch(query, index, {distance: 2})
// returns:
//[ [ 'query' ], [ 'word', 'words' ], [ 'levenshtein' ] ]
</script>
It's fully possible to use on Node.js too. The tests are both for Node.js and the browser. It's only wrapping 6 libraries for the ease of use in the browser, but could come in handy for i.e. simple crawler scenarios.
FAQs
Simple document processor to make search running in the browser and node.js a little better. Supports 50+ languages. Removes stopwords (smaller index and less irrelevant hits), extract keywords to filter on and prepares ngrams for auto-complete functional
The npm package daq-proc receives a total of 1 weekly downloads. As such, daq-proc popularity was classified as not popular.
We found that daq-proc demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
Security News
Research
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
Research
Security News
Socket researchers discovered a malware campaign on npm delivering the Skuld infostealer via typosquatted packages, exposing sensitive data.