Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

bloom-filters

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

bloom-filters

JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash

1.0.0
Source
npm

Version published: 5 years ago

Weekly downloads: 89K; decreased by-51.54%

Maintainers: 2

Weekly downloads

Created: 8 years ago

What is bloom-filters?

The 'bloom-filters' npm package provides data structures and algorithms for probabilistic data structures, specifically Bloom filters, which are used to test whether an element is a member of a set. These structures are highly space-efficient and are useful for applications where a small rate of false positives is acceptable.

What are bloom-filters's main functionalities?

Creating a Bloom Filter

This feature allows you to create a Bloom filter with a specified size and number of hash functions. You can add elements to the filter and check for their existence. Note that false positives are possible, but false negatives are not.

const { BloomFilter } = require('bloom-filters');
const filter = new BloomFilter(10, 4);
filter.add('apple');
console.log(filter.has('apple')); // true
console.log(filter.has('banana')); // false

Serializing and Deserializing a Bloom Filter

This feature allows you to serialize a Bloom filter to JSON and later deserialize it. This is useful for saving the state of a filter and reloading it later.

const { BloomFilter } = require('bloom-filters');
const filter = new BloomFilter(10, 4);
filter.add('apple');
const json = JSON.stringify(filter.save());
const loadedFilter = BloomFilter.fromJSON(JSON.parse(json));
console.log(loadedFilter.has('apple')); // true

Counting Bloom Filter

This feature provides a Counting Bloom Filter, which allows for the removal of elements. This is achieved by maintaining a count of the number of times an element has been added.

const { CountingBloomFilter } = require('bloom-filters');
const filter = new CountingBloomFilter(10, 4);
filter.add('apple');
filter.add('apple');
console.log(filter.has('apple')); // true
filter.remove('apple');
console.log(filter.has('apple')); // true
filter.remove('apple');
console.log(filter.has('apple')); // false

Other packages similar to bloom-filters

Bloom-Filters

Keywords: bloom, filter, bloom filter, probabilistic, datastructure

JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash

Use non-cryptographic hash internally since (v0.7.0) XXHASH

Breaking API changes from the 0.7.1 to the 0.8.0 version.

Online documentation

Installation

npm install bloom-filters --save

Data structures

Classic Bloom Filter

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not.

Reference: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. (Full text article)

Methods:

add(element: string) -> void: add an element into the filter.
has(element: string) -> boolean': Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.
equals(other: BloomFilter) -> boolean: Test if two filters are equals.
rate() -> number: compute the filter's false positive rate (or error rate).

const { BloomFilter } = require('bloom-filters')
// create a Bloom Filter with a size of 10 and 4 hash functions
let filter = new BloomFilter(10, 4)
// insert data
filter.add('alice')
filter.add('bob')

// lookup for some data
console.log(filter.has('bob')) // output: true
console.log(filter.has('daniel')) // output: false

// print the error rate
console.log(filter.rate())

// alternatively, create a bloom filter optimal for a number of items and a desired error rate
const items = ['alice', 'bob']
const errorRate = 0.04 // 4 % error rate
filter = BloomFilter.create(items.length, errorRate)

// or create a bloom filter optimal for a collections of items and a desired error rate
filter = BloomFilter.from(items, errorRate)

Partitioned Bloom Filter

A Partitioned Bloom Filter is a variation of a classic Bloom Filter.

This filter works by partitioning the M-sized bit array into k slices of size m = M/k bits, k = nb of hash functions in the filter. Each hash function produces an index over m for its respective slice. Thus, each element is described by exactly k bits, meaning the distribution of false positives is uniform across all elements.

Be careful, as a Partitioned Bloom Filter have much higher collison risks that a classic Bloom Filter on small sets of data.

Reference: Chang, F., Feng, W. C., & Li, K. (2004, March). Approximate caches for packet classification. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (Vol. 4, pp. 2196-2207). IEEE. (Full text article)

Methods:

add(element: string) -> void: add an element into the filter.
has(element: string) -> boolean': Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.
equals(other: PartitionedBloomFilter) -> boolean: Test if two filters are equals.
rate() -> number: compute the filter's false positive rate (or error rate).

const { PartitionedBloomFilter } = require('bloom-filters')

// create a PartitionedBloomFilter of size 10, with 5 hash functions and a load factor of 0.5
const filter = new PartitionedBloomFilter(10, 5, 0.5)

// add some value in the filter
filter.add('alice')
filter.add('bob')

// lookup for some data
console.log(filter.has('bob')) // output: true
console.log(filter.has('daniel')) // output: false

// now use it like a classic bloom filter!
// ...

// alternatively, create a PartitionedBloomFilter optimal for a number of items and a desired error rate
const items = ['alice', 'bob']
const errorRate = 0.04 // 4 % error rate
filter = PartitionedBloomFilter.create(items.length, errorRate)

// or create a PartitionedBloomFilter optimal for a collections of items and a desired error rate
filter = PartitionedBloomFilter.from(items, errorRate)

Cuckoo Filter

Cuckoo filters improve on Bloom filters by supporting deletion, limited counting, and bounded False positive rate with similar storage efficiency as a standard Bloom Filter.

Reference: Fan, B., Andersen, D. G., Kaminsky, M., & Mitzenmacher, M. D. (2014, December). Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (pp. 75-88). ACM. (Full text article)

Methods:

add(element: string) -> void: add an element into the filter.
remove(element: string) -> boolean: delete an element from the filter, returning True if the deletion was a success and False otherwise.
has(element: string) -> boolean': Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.
equals(other: CuckooFilter) -> boolean: Test if two filters are equals.
rate() -> number: compute the filter's false positive rate (or error rate).

const { CuckooFilter } = require('bloom-filters')

// create a Cuckoo Filter with size = 15, fingerprint length = 3 and bucket size = 2
const filter = new CuckooFilter(15, 3, 2)
filter.add('alice')
filter.add('bob')

// lookup for some data
console.log(filter.has('bob')) // output: true
console.log(filter.has('daniel')) // output: false

// remove something
filter.remove('bob')
console.log(filter.has('bob')) // output: false

// alternatively, create a Cuckoo Filter optimal for a number of items and a desired error rate
const items = ['alice', 'bob']
const errorRate = 0.04 // 4 % error rate
filter = CuckooFilter.create(items.length, errorRate)

// or create a Cuckoo Filter optimal for a collections of items and a desired error rate
filter = CuckooFilter.from(items, errorRate)

WARNING: The error rate cannot be higher than 1 * 10^-18. After this, You will get an error saying that the fingerprint length is higher than the hash length.

Counting Bloom Filter

A Counting Bloom filter works in a similar manner as a regular Bloom filter; however, it is able to keep track of insertions and deletions. In a counting Bloom filter, each entry in the Bloom filter is a small counter associated with a basic Bloom filter bit.

Reference: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved Construction for Counting Bloom Filters,” in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006

Methods:

add(element: string) -> void: add an element into the filter.
remove(element: string) -> boolean: delete an element from the filter, returning True if the deletion was a success and False otherwise.
has(element: string) -> boolean': Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.
equals(other: CountingBloomFilter) -> boolean: Test if two filters are equals.
rate() -> number: compute the filter's false positive rate (or error rate).

const CountingBloomFilter = require('bloom-filters').CountingBloomFilter;

// create a Bloom Filter with capacity = 15 and 4 hash functions
let filter = new CountingBloomFilter(15, 4);

// add some value in the filter
filter.add('alice');
filter.add('bob');
filter.add('carole');

// remove some value
filter.remove('carole');

// lookup for some data
console.log(filter.has('bob')); // output: true
console.log(filter.has('carole')); // output: false
console.log(filter.has('daniel')); // output: false

// print false positive rate (around 0.1)
console.log(filter.rate());

// alternatively, create a Counting Bloom Filter optimal for a number of items and a desired error rate
const items = ['alice', 'bob']
const errorRate = 0.04 // 4 % error rate
filter = CountingBloomFilter.create(items.length, errorRate)

// or create a Counting Bloom Filter optimal for a collections of items and a desired error rate
filter = CountingBloomFilter.from(items, errorRate)

Count Min Sketch

The Count Min Sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions.

Reference: Cormode, G., & Muthukrishnan, S. (2005). An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1), 58-75. (Full text article)

Methods:

update(element: string, count = 1) -> void: add count occurences of an element into the sketch.
count(element: string) -> number: estimate the number of occurences of an element.
merge(other: CountMinSketch) -> CountMinSketch: merge with occurences of two sketches.
equals(other: CountMinSketch) -> boolean: Test if two sketchs are equals.
clone(): CountMinSketch: Clone the sketch.

const { CountMinSketch } = require('bloom-filters')

// create a new Count Min sketch with 2048 columns and 1 row
const sketch = new CountMinSketch(2048, 1)

// push some occurrences in the sketch
sketch.update('alice')
sketch.update('alice')
sketch.update('bob')

// count occurrences
console.log(sketch.count('alice')) // output: 2
console.log(sketch.count('bob')) // output: 1
console.log(sketch.count('daniel')) // output: 0

// alternatively, create a Count Min sketch optimal for a target error rate and probability of accuracy
const items = ['alice', 'bob']
const errorRate = 0.04 // 4 % error rate
const accuracy = 0.99 // 99% accuracy
sketch = CountMinSketch.create(errorRate, accuracy)

// or create a Count Min Sketch optimal for a collections of items, 
// a target error rate and probability of accuracy
sketch = CountMinSketch.from(items, errorRate, accuracy)

Invertible Bloom Filters

An Invertible Bloom Filters (IBLT), also called Invertible Bloom Lookup Table, is a space-efficient and probabilistic data-structure for solving the set-difference problem efficiently without the use of logs or other prior context. It computes the set difference with communication proportional to the size of the difference between the sets being compared. They can simultaneously calculate D(A−B) and D(B−A) using O(d) space. This data structure encodes sets in a fashion that is similar in spirit to Tornado codes’ construction, in that it randomly combines elements using the XOR function.

Reference: Eppstein, D., Goodrich, M. T., Uyeda, F., & Varghese, G. (2011). What's the difference?: efficient set reconciliation without prior context. ACM SIGCOMM Computer Communication Review, 41(4), 218-229. full-text article

WARNING: An IBLT only accepts Buffer as inputs. If you are using bloom-filters in a Web browser, you might consider using the feros/buffer package, which provides a polyfill for Buffer in a browser.

Methods

add(element: Buffer) -> void: add an element into the filter.
remove(element: Buffer) -> void: delete an element from the filter, returning True if the deletion was a success and False otherwise.
has(element: Buffer) -> boolean': Test an element for membership, returning False if the element is definitively not in the filter and True is the element might be in the filter.
equals(other: InvertibleBloomFilter) -> boolean: Test if two filters are equals.
substract(remote: InvertibleBloomFilter): peform the XOR substraction of two IBLTs.
decode() -> {additional: Buffer[], missing: Buffer[]} : decode an IBLT.
listEntries() -> Generator<Buffer, number, void>: list all entries in the IBLT using a Generator.

const { InvertibleBloomFilter } = require('bloom-filters')

const hashcount = 3
const size = 50
const iblt = new InvertibleBloomFilter(size, hashcount)

// push some data in the IBLT
iblt.add(Buffer.from('alice'))
iblt.add(Buffer.from('42'))
iblt.add(Buffer.from('help'))
iblt.add(Buffer.from('meow'))
iblt.add(Buffer.from('json'))

console.log(ilbt.has(Buffer.from('alice'))) // output: true
console.log(ilbt.has(Buffer.from('daniel'))) // output: false

iblt.remove(Buffer.from('alice'))
console.log(ilbt.has(Buffer.from('alice'))) // output: false

// Now, let's demonstrate the decoding power of IBLT!
const remote = new InvertibleBloomFilter(size, hashcount)
remote.add(Buffer.from('alice'))
remote.add(Buffer.from('car'))
remote.add(Buffer.from('meow'))
remote.add(Buffer.from('help'))

// decode the difference between the two filters
const result = iblt.substract(remote).decode()

console.log(`Did we successfully decode the subtracted iblts? ${result.success}. Why? $${result.reason}`)
console.log(`Elements of iblt missing elements from remote: ${result.additional}`)
console.log(`Elements of remote missing elements from iblt: ${result.missing}`)

// alternatively, create an IBLT optimal for a number of items and a desired error rate
const items = [Buffer.from('alice'), Buffer.from('bob')]
const errorRate = 0.04 // 4 % error rate
filter = InvertibleBloomFilter.create(items.length, errorRate)

// or create an IBLT optimal for a collections of items and a desired error rate
filter = InvertibleBloomFilter.from(items, errorRate)

Tuning the IBLT We recommend to use at least a hashcount of 3 and an alpha of 1.5 for at least 50 differences, which equals to 1.5*50 = 75 cells. Then, if you insert a huge number of values in there, the decoding will work (whatever the number of differences less than 50) but testing the presence of a value is still probabilistic, based on the number of elements inserted (Even for the functions like listEntries). For more details, you should read the seminal research paper on IBLTs (full-text article).

Export and import

All data structures exposed by this package can be exported and imported to/from JSON:

Use the method saveAsJSON() to export any data structures into a JSON object.
Use the static method fromJSON(json) to load a data structure from a JSON object.

const { BloomFilter } = require('bloom-filters')

const filter = new BloomFilter(15, 0.01)
filter.add('alice')

// export a bloom filter to JSON
const exported = filter.saveAsJSON()

// do something with the JSON object (save it as file, send it to a server, etc)
// ...

// import the same filter from its JSON export
const importedFilter = BloomFilter.fromJSON(exported)
console.log(filter.has('alice')) // output: true
console.log(filter.has('bob')) // output: false

Every hash function is seeded

By default every hash function is seeded with an internal seed which is equal to 0x1234567890. If you want to change it:

const { BloomFilter } = require('bloom-filter')
const bl = new BloomFilter(...)
console.log(bl.seed) // 78187493520
bl.seed = 0xABCD
console.log(bl.seed) // 43981

Documentation

See documentation online or generate it in directory doc/ with: npm run doc

Tests

Running with Mocha + Chai

# run tests
npm test

References

Classic Bloom Filter: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426.
Partitioned Bloom Filter: Chang, F., Feng, W. C., & Li, K. (2004, March). Approximate caches for packet classification. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (Vol. 4, pp. 2196-2207). IEEE.
Cuckoo Filter: Fan, B., Andersen, D. G., Kaminsky, M., & Mitzenmacher, M. D. (2014, December). Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (pp. 75-88). ACM.
Counting Bloom Filter: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, An Improved Construction for Counting Bloom Filters, in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006, pp.
Count Min Sketch: Cormode, G., & Muthukrishnan, S. (2005). An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1), 58-75.
Invertible Bloom Filters: Eppstein, D., Goodrich, M. T., Uyeda, F., & Varghese, G. (2011). What's the difference?: efficient set reconciliation without prior context. ACM SIGCOMM Computer Communication Review, 41(4), 218-229.

Changelog

v1.0.0: Rework the whole library using TypeScript, unify the API and fix the documentation.

v0.8.0: Fix some issues with the cuckoo filter (performances). Fix the global API. It allows now to customize each Filter. If you want to use the old API, use the .create() or .from() functions to match the old api.

v0.7.1: Add the Counting Bloom Filter.

v0.7.0 Move to XXHASH for hashing elements in the library. One property has been added into the exported json _seed which is used to seed every hash of every elements. Update Invertible Bloom Filters with #add, #has, #delete, #listEntries, #substract, #Static.decode methods. Updated the way to get distinct indices which could have collisions in many cases.

v0.6.1 Add Invertible Bloom Filters (only #encode/#substract/#Static.decode methods)

License

MIT License

Keywords

FAQs

What is bloom-filters?

Is bloom-filters popular?

Is bloom-filters well maintained?

Package last updated on 23 Mar 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

bloom-filters

What is bloom-filters?

What are bloom-filters's main functionalities?

Other packages similar to bloom-filters

Bloom-Filters

Table of contents

Installation

Data structures

Classic Bloom Filter

Partitioned Bloom Filter

Cuckoo Filter

Counting Bloom Filter

Count Min Sketch

Invertible Bloom Filters

Export and import

Every hash function is seeded

Documentation

Tests

References

Changelog

License

Keywords

Related posts

bloom-filters

What is bloom-filters?

What are bloom-filters's main functionalities?

Other packages similar to bloom-filters

bloomfilter

bloomxx

bloom-filter-cpp

Bloom-Filters

Table of contents

Installation

Data structures

Classic Bloom Filter

Partitioned Bloom Filter

Cuckoo Filter

Counting Bloom Filter

Count Min Sketch

Invertible Bloom Filters

Export and import

Every hash function is seeded

Documentation

Tests

References

Changelog

License

Keywords

Related posts

Node.js Implements Stricter Policies for Semver-Major Pull Requests Ahead of Release Deadlines

Roblox Developers Targeted with npm Packages Infected with Skuld Infostealer and Blank Grabber

vlt Debuts New JavaScript Package Manager and Serverless Registry at NodeConf EU