Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
github.com/opencoff/go-bbhash
A library to create, query and serialize/de-serialize minimal perfect hash functions over very large key sets.
This is an implementation of this paper. It is in part inspired by Damien Gryski's Boomphf - this implementation differs from Boomphf in one significant way - this library adds an efficient serialization & deserialization API.
The library exposes the following types:
BBHash
: Represents an instance of a minimal perfect hash
function as described in the paper above.DBWriter
: Used to construct a constant database of key-value
pairs - where the lookup of a given key is done in constant time
using BBHash
. Essentially, this type serializes a collection
of key-value pairs using BBHash
as the underlying index.DBReader
: Used for looking up key-values from a previously
constructed (serialized) database.NOTE Minimal Perfect Hash functions take a fixed input and generate a mapping to lookup the items in constant time. In particular, they are NOT a replacement for a traditional hash-table; i.e., it may yield false-positives when queried using keys not present during construction. In concrete terms:
Let S = {k0, k1, ... kn} be your input key set.
If H: S -> {0, .. n} is a minimal perfect hash function, then H(kx) for kx NOT in S may yield an integer result (indicating that kx was successfully "looked up").
Thus, if users of BBHash
are unsure of the input being passed to such a
Lookup()
function, they should add an additional comparison against
the actual key to verify. Look at dbreader.go:Find()
for an
example.
Like any other golang library: go get github.com/opencoff/go-bbhash
.
There is a working example of the DBWriter
and DBReader
interfaces in the
file example/mphdb.go. This example demonstrates the following functionality:
First, lets run some tests and make sure bbhash is working fine:
$ git clone https://github.com/opencoff/go-bbhash
$ cd go-bbhash
$ make test
Now, lets build and run the example program:
$ make
$ ./mphdb -h
There is a helper python script to generate a very large text file of
hostnames and IP addresses: genhosts.py
. You can run it like so:
$ python ./example/genhosts.py 192.168.0.0/16 > a.txt
The above example generates 65535 hostnames and corresponding IP addresses; each of the IP addresses is sequentially drawn from the subnet.
NOTE If you use a "/8" subnet mask you will generate a lot of data (~430MB in size).
Once you have the input generated, you can feed it to the example
program above to generate
a MPH DB:
$ ./mphdb foo.db a.txt
$ ./mphdb -V foo.db
It is possible that "mphdb" fails to construct a DB and complains of gamma being too small. In that case, try increasing "g" like so:
$ ./mphdb -g 2.75 foo.db a.txt
Assuming you have read your keys, hashed them into uint64
, this is how you can use the library:
bb, err := bbhash.New(2.0, keys)
if err != nil { panic(err) }
// Now, call Find() with each key to gets its unique mapping.
// Note: Find() returns values in the range closed-interval [1, len(keys)]
for i, k := range keys {
j := bb.Find(k)
fmt.Printf("%d: %#x maps to %d\n", i, k, j)
}
One can construct an on-disk constant-time lookup using BBHash
as
the underlying indexing mechanism. Such a DB is useful in situations
where the key/value pairs are NOT changed frequently; i.e.,
read-dominant workloads. The typical pattern in such situations is
to build the constant-DB once for efficient retrieval and do
lookups multiple times.
For example, let us suppose that file a.txt and b.csv have lots of key,value pairs. We will build a constant DB using this.
wr, err := bbhash.NewDBWriter("file.db")
if err != nil { panic(err) }
// add a.txt and a.csv to this db
// txt file delimited by white space;
// first token is the key, second token is the value
n, err := wr.AddTextFile("a.txt", " \t")
if err != nil { panic(err) }
fmt.Printf("a.txt: %d records added\n", n)
// CSV file - comma delimited
// lines starting with '#' are considered comments
// field 0 is the key; and field 1 is the value.
// The first line is assumed to be a header and ignored.
n, err := wr.AddCSVFile("b.csv", ',', '#', 0, 1)
if err != nil { panic(err) }
fmt.Printf("b.csv: %d records added\n", n)
// Now, freeze the DB and write to disk.
// We will use a larger "gamma" value to increase chances of
// finding a minimal perfect hash function.
err = wr.Freeze(3.0)
if err != nil { panic(err) }
Now, file.db
has the key/value pairs from the two input files
stored in an efficient format for constant-time retrieval.
Continuing the above example, suppose that you want to use the constructed DB for repeated lookups of various keys and retrieve their corresponding values:
// read 'file.db' and cache upto 10,000
// records in memory.
rd, err := bbhash.NewDBReader("file.db", 10000)
if err != nil { panic(err) }
Now, given a key k
, we can use rd
to lookup the corresponding
value:
val, err := rd.Find(k)
if err != nil {
if err == bbhash.ErrNoKey {
fmt.Printf("Key %x is not in the DB\n", k)
} else {
fmt.Printf("Error: %s\n", err)
}
}
fmt.Printf("Key %x => Value %x\n", k, val)
For constructing the BBHash, keys are uint64
; the DBWriter
implementation uses Zi Long Tan's superfast hash function to
transform arbitary bytes to uint64.
The perfect-hash index for each key is "1" based (i.e., it is in the closed
interval [1, len(keys)]
.
GPL v2.0
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.