Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

bloom-filters

Package Overview
Dependencies
Maintainers
2
Versions
28
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

bloom-filters - npm Package Compare versions

Comparing version 1.2.0 to 1.3.0

dist/sketch/min-hash-factory.d.ts

2

dist/api.d.ts

@@ -7,4 +7,6 @@ export { default as BloomFilter } from './bloom/bloom-filter';

export { default as TopK } from './sketch/topk';
export { MinHash } from './sketch/min-hash';
export { default as MinHashFactory } from './sketch/min-hash-factory';
export { default as CuckooFilter } from './cuckoo/cuckoo-filter';
export { default as InvertibleBloomFilter } from './iblt/invertible-bloom-lookup-tables';
export { default as Cell } from './iblt/cell';

@@ -38,2 +38,6 @@ /* file : api.ts

exports.TopK = topk_1.default;
var min_hash_1 = require("./sketch/min-hash");
exports.MinHash = min_hash_1.MinHash;
var min_hash_factory_1 = require("./sketch/min-hash-factory");
exports.MinHashFactory = min_hash_factory_1.default;
var cuckoo_filter_1 = require("./cuckoo/cuckoo-filter");

@@ -40,0 +44,0 @@ exports.CuckooFilter = cuckoo_filter_1.default;

2

package.json
{
"name": "bloom-filters",
"version": "1.2.0",
"version": "1.3.0",
"description": "JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash",

@@ -5,0 +5,0 @@ "main": "dist/api.js",

@@ -24,2 +24,3 @@ # Bloom-Filters

* [HyperLogLog](#hyperloglog)
* [MinHash](#minhash)
* [Top-K](#top-k)

@@ -291,2 +292,45 @@ * [Invertible Bloom Filters](#invertible-bloom-filters)

### MinHash
**MinHash** (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are.
The goal of MinHash is to estimate the *Jaccard similarity coefficient*, a commonly used indicator of the similarity between two sets, without explicitly computing the intersection and union of the two sets.
It does so by computing fixed sized signatures for a set of numbers using randomly generated hash functions.
❗️**WARNINGS**❗
* A `MinHash` class only accepts `numbers` (integers and floats) as inputs.
* Two MinHash can be compared **only if they share the same set of randomly generated hash functions**. To ease the creation of MinHash sets, we introduce a `MinHashFactory` class that is able to create MinHash structures that *share the same set of hash functions*. We recommend most users **to rely on the factory**, but the `MinHash` class remains importable for advanced usage.
**Reference:** Andrei Z. Broder, *"On the resemblance and containment of documents"*, in Compression and Complexity of Sequences: Proceedings (1997).
([Full text article](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.779&rep=rep1&type=pdf))
**Methods**:
* `add(element: number) -> void`: add a new element to the set.
* `bulkLoad(elements: number[]) -> void`: add several new elements to the set in an efficient manner.
* `isEmpty() -> boolean`: test if the signature of the MinHash is empty.
* `compareWith(other: MinHash) -> number`: estimate the Jaccard similarity coefficient with another MinHash set.
```javascript
const { MinHashFactory } = require('bloom-filters')
// create the MinHashFactory, to create several comparable MinHash sets
// it uses 10 random hash functions and expect to see a maximum value of 999
const sketch = new MinHasFactory(10, 999)
// create two empty MinHash
const fistSet = factory.create()
const secondSet = factory.create()
// push some occurrences in the first set
fistSet.add(1)
fistSet.add(2)
// the MinHash also supports bulk loading
secondSet.bulkLoad([1, 3, 4])
// estimate the jaccard similarity between the two sets
const jaccardSim = fistSet.compareWith(secondSet)
console.Log(`The estimated Jaccard similarity is ${jaccardSim}`)
```
### Top-K

@@ -330,7 +374,7 @@

❗️**WARNING**❗️ An IBLT only accepts [`Buffer`](https://nodejs.org/api/buffer.html) as inputs. If you are using `bloom-filters` in a Web browser, you might consider using the [`feros/buffer`](https://www.npmjs.com/package/buffer) package, which provides a polyfill for `Buffer` in a browser.
**Reference:** Eppstein, D., Goodrich, M. T., Uyeda, F., & Varghese, G. (2011). *What's the difference?: efficient set reconciliation without prior context.* ACM SIGCOMM Computer Communication Review, 41(4), 218-229.
([Full text article](http://www.sysnet.ucsd.edu/sysnet/miscpapers/EppGooUye-SIGCOMM-11.pdf))
**WARNING**: An IBLT only accepts [`Buffer`](https://nodejs.org/api/buffer.html) as inputs. If you are using `bloom-filters` in a Web browser, you might consider using the [`feros/buffer`](https://www.npmjs.com/package/buffer) package, which provides a polyfill for `Buffer` in a browser.
**Methods**

@@ -447,2 +491,3 @@ * `add(element: Buffer) -> void`: add an element into the filter.

* [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf): Philippe Flajolet, Éric Fusy, Olivier Gandouet and Frédéric Meunier (2007). *"Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm"*. Discrete Mathematics and Theoretical Computer Science Proceedings.
* [MinHash](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.779&rep=rep1&type=pdf): Andrei Z. Broder, *"On the resemblance and containment of documents"*, in Compression and Complexity of Sequences: Proceedings (1997).
* [Invertible Bloom Filters](http://www.sysnet.ucsd.edu/sysnet/miscpapers/EppGooUye-SIGCOMM-11.pdf): Eppstein, D., Goodrich, M. T., Uyeda, F., & Varghese, G. (2011). *What's the difference?: efficient set reconciliation without prior context.* ACM SIGCOMM Computer Communication Review, 41(4), 218-229.

@@ -452,15 +497,14 @@

**v1.1.0**: Added the HyperLogLog sketch.
| **Version** | **Release date** | **Major changes** |
|---|---|---|
| `v1.3.0` | 10/04/2020 | Added the MinHash set |
| `v1.2.0` | 08/04/2020 | Add the TopK class |
| `v1.1.0` | 03/04/2020 | Add the HyperLogLog sketch |
| `v1.0.0` | 23/03/2020 | Rework the whole library using TypeScript, unify the API and fix the documentation |
| `v0.8.0` | 11/11/2019 | Fix some issues with the cuckoo filter (performances). Fix the global API. It allows now to customize each Filter. If you want to use the old API, use the `.create()` or `.from()` functions to match the old api. |
| `v0.7.1` | 11/09/2019 | Add the Counting Bloom Filter |
| `v0.7.0` | 01/07/2019 | Move to [XXHASH](https://cyan4973.github.io/xxHash/) for hashing elements in the library. One property has been added into the exported json `_seed` which is used to seed every hash of every elements. Update Invertible Bloom Filters with #add, #has, #delete, #listEntries, #substract, #Static.decode methods. Updated the way to get distinct indices which could have collisions in many cases. |
| `v0.6.1` | 18/06/2019 | Add Invertible Bloom Filters (only #encode/#substract/#Static.decode methods) |
**v1.0.0**: Rework the whole library using TypeScript, unify the API and fix the documentation.
**v0.8.0**: Fix some issues with the cuckoo filter (performances). Fix the global API. It allows now to customize each Filter. If you want to use the old API, use the `.create()` or `.from()` functions to match the old api.
**v0.7.1**: Add the Counting Bloom Filter.
**v0.7.0** Move to [XXHASH](https://cyan4973.github.io/xxHash/) for hashing elements in the library. One property has been added into the exported json `_seed` which is used to seed every hash of every elements. Update Invertible Bloom Filters with #add, #has, #delete, #listEntries, #substract, #Static.decode methods. Updated the way to get distinct indices which could have collisions in many cases.
**v0.6.1** Add Invertible Bloom Filters (only #encode/#substract/#Static.decode methods)
## License
[MIT License](https://github.com/Callidon/bloom-filters/blob/master/LICENSE)

@@ -33,4 +33,6 @@ /* file : api.ts

export { default as TopK } from './sketch/topk'
export { MinHash } from './sketch/min-hash'
export { default as MinHashFactory } from './sketch/min-hash-factory'
export { default as CuckooFilter } from './cuckoo/cuckoo-filter'
export { default as InvertibleBloomFilter } from './iblt/invertible-bloom-lookup-tables'
export { default as Cell } from './iblt/cell'
SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc