Research
Security News
Quasar RAT Disguised as an npm Package for Detecting Vulnerabilities in Ethereum Smart Contracts
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
Futzy is a configurable fuzzy matching library for tools which search over structured input. Unlike many other fuzzy matching libraries which match arbitrary characters, Futzy matches based on tokens. Tokenization unlocks meaningful performance improvements (especially for very large datasets), and provides more accurate search results (at the cost of a small number of keypresses).
Futzy breaks strings in the provided dataset into tokens. Tokens are case-insensitive and must be alphanumeric. Non-alphanumeric characters are not indexed. Results are sorted based on relevance, where "relevance" prioritizes strings which have matching tokens earlier in the string, and strings where matching tokens have the fewest possible tokens between them.
Performance is achieved in a few ways:
const {Index} = require('futzy');
const testCorpus = [
"abc.def",
"xxxx.yyyy",
"xxx.abc.yyy.zzz.def",
"xxx.abc.yyy.zzz",
"xxx.abc.yyy",
"xxx.abc.defg",
];
const index = new Index(testCorpus, {});
const results = index.search('a d');
/*
results ==
[
"abc.def",
"xxx.abc.defg",
"xxx.abc.yyy.zzz.def",
]
*/
The Index
class takes two parameters:
Array<string>
, which is the datasetThe options object may have these members:
performRawSearch
(Default false
): If there are fewer results than resultLimit
, strings in the dataset that contain the search query as a substring are appended to the results, up to resultLimit
.performRawSearchWhenNoResults
(Default true
): Only performs the performRawSearch
behavior if there would otherwise be no results. This is useful as a backup to support weird queries.resultLimit
(Default 20
): The maximum number of strings to return in the results. Keeping this value small improves performance for large datasets.Input is first tokenized. E.g., fo ba za
becomes ['fo', 'ba', 'za']
. Each string is tested. To be a plausible result, the string must contain each of the provided tokens, in order. For example, the following input strings would match:
foo.bar.zap
fomo is bad but zalgo does not care
FOO BAR ZAP
fo ba za
The following input strings would not match:
football zap
foo zap bar
f oo bar zap
foobarzap
This algorithm is useful for the following types of UIs:
git ls-files
)Any dataset which has a reasonable cardinality of tokens (relative to the number of strings), where each string is cleanly divided into tokens, is likely a good fit for Futzy.
tokenizer.js
, and contributions are welcome.Additionally, the following performance issues are known, though they are generally only problematic with large datasets:
FAQs
A fuzzy string matching library
We found that futzy demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
Security News
Research
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
Research
Security News
Socket researchers discovered a malware campaign on npm delivering the Skuld infostealer via typosquatted packages, exposing sensitive data.