bloom-filters
Advanced tools
Comparing version 1.2.0 to 1.3.0
@@ -7,4 +7,6 @@ export { default as BloomFilter } from './bloom/bloom-filter'; | ||
export { default as TopK } from './sketch/topk'; | ||
export { MinHash } from './sketch/min-hash'; | ||
export { default as MinHashFactory } from './sketch/min-hash-factory'; | ||
export { default as CuckooFilter } from './cuckoo/cuckoo-filter'; | ||
export { default as InvertibleBloomFilter } from './iblt/invertible-bloom-lookup-tables'; | ||
export { default as Cell } from './iblt/cell'; |
@@ -38,2 +38,6 @@ /* file : api.ts | ||
exports.TopK = topk_1.default; | ||
var min_hash_1 = require("./sketch/min-hash"); | ||
exports.MinHash = min_hash_1.MinHash; | ||
var min_hash_factory_1 = require("./sketch/min-hash-factory"); | ||
exports.MinHashFactory = min_hash_factory_1.default; | ||
var cuckoo_filter_1 = require("./cuckoo/cuckoo-filter"); | ||
@@ -40,0 +44,0 @@ exports.CuckooFilter = cuckoo_filter_1.default; |
{ | ||
"name": "bloom-filters", | ||
"version": "1.2.0", | ||
"version": "1.3.0", | ||
"description": "JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash", | ||
@@ -5,0 +5,0 @@ "main": "dist/api.js", |
@@ -24,2 +24,3 @@ # Bloom-Filters | ||
* [HyperLogLog](#hyperloglog) | ||
* [MinHash](#minhash) | ||
* [Top-K](#top-k) | ||
@@ -291,2 +292,45 @@ * [Invertible Bloom Filters](#invertible-bloom-filters) | ||
### MinHash | ||
**MinHash** (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. | ||
The goal of MinHash is to estimate the *Jaccard similarity coefficient*, a commonly used indicator of the similarity between two sets, without explicitly computing the intersection and union of the two sets. | ||
It does so by computing fixed sized signatures for a set of numbers using randomly generated hash functions. | ||
❗️**WARNINGS**❗ | ||
* A `MinHash` class only accepts `numbers` (integers and floats) as inputs. | ||
* Two MinHash can be compared **only if they share the same set of randomly generated hash functions**. To ease the creation of MinHash sets, we introduce a `MinHashFactory` class that is able to create MinHash structures that *share the same set of hash functions*. We recommend most users **to rely on the factory**, but the `MinHash` class remains importable for advanced usage. | ||
**Reference:** Andrei Z. Broder, *"On the resemblance and containment of documents"*, in Compression and Complexity of Sequences: Proceedings (1997). | ||
([Full text article](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.779&rep=rep1&type=pdf)) | ||
**Methods**: | ||
* `add(element: number) -> void`: add a new element to the set. | ||
* `bulkLoad(elements: number[]) -> void`: add several new elements to the set in an efficient manner. | ||
* `isEmpty() -> boolean`: test if the signature of the MinHash is empty. | ||
* `compareWith(other: MinHash) -> number`: estimate the Jaccard similarity coefficient with another MinHash set. | ||
```javascript | ||
const { MinHashFactory } = require('bloom-filters') | ||
// create the MinHashFactory, to create several comparable MinHash sets | ||
// it uses 10 random hash functions and expect to see a maximum value of 999 | ||
const sketch = new MinHasFactory(10, 999) | ||
// create two empty MinHash | ||
const fistSet = factory.create() | ||
const secondSet = factory.create() | ||
// push some occurrences in the first set | ||
fistSet.add(1) | ||
fistSet.add(2) | ||
// the MinHash also supports bulk loading | ||
secondSet.bulkLoad([1, 3, 4]) | ||
// estimate the jaccard similarity between the two sets | ||
const jaccardSim = fistSet.compareWith(secondSet) | ||
console.Log(`The estimated Jaccard similarity is ${jaccardSim}`) | ||
``` | ||
### Top-K | ||
@@ -330,7 +374,7 @@ | ||
❗️**WARNING**❗️ An IBLT only accepts [`Buffer`](https://nodejs.org/api/buffer.html) as inputs. If you are using `bloom-filters` in a Web browser, you might consider using the [`feros/buffer`](https://www.npmjs.com/package/buffer) package, which provides a polyfill for `Buffer` in a browser. | ||
**Reference:** Eppstein, D., Goodrich, M. T., Uyeda, F., & Varghese, G. (2011). *What's the difference?: efficient set reconciliation without prior context.* ACM SIGCOMM Computer Communication Review, 41(4), 218-229. | ||
([Full text article](http://www.sysnet.ucsd.edu/sysnet/miscpapers/EppGooUye-SIGCOMM-11.pdf)) | ||
**WARNING**: An IBLT only accepts [`Buffer`](https://nodejs.org/api/buffer.html) as inputs. If you are using `bloom-filters` in a Web browser, you might consider using the [`feros/buffer`](https://www.npmjs.com/package/buffer) package, which provides a polyfill for `Buffer` in a browser. | ||
**Methods** | ||
@@ -447,2 +491,3 @@ * `add(element: Buffer) -> void`: add an element into the filter. | ||
* [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf): Philippe Flajolet, Éric Fusy, Olivier Gandouet and Frédéric Meunier (2007). *"Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm"*. Discrete Mathematics and Theoretical Computer Science Proceedings. | ||
* [MinHash](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.779&rep=rep1&type=pdf): Andrei Z. Broder, *"On the resemblance and containment of documents"*, in Compression and Complexity of Sequences: Proceedings (1997). | ||
* [Invertible Bloom Filters](http://www.sysnet.ucsd.edu/sysnet/miscpapers/EppGooUye-SIGCOMM-11.pdf): Eppstein, D., Goodrich, M. T., Uyeda, F., & Varghese, G. (2011). *What's the difference?: efficient set reconciliation without prior context.* ACM SIGCOMM Computer Communication Review, 41(4), 218-229. | ||
@@ -452,15 +497,14 @@ | ||
**v1.1.0**: Added the HyperLogLog sketch. | ||
| **Version** | **Release date** | **Major changes** | | ||
|---|---|---| | ||
| `v1.3.0` | 10/04/2020 | Added the MinHash set | | ||
| `v1.2.0` | 08/04/2020 | Add the TopK class | | ||
| `v1.1.0` | 03/04/2020 | Add the HyperLogLog sketch | | ||
| `v1.0.0` | 23/03/2020 | Rework the whole library using TypeScript, unify the API and fix the documentation | | ||
| `v0.8.0` | 11/11/2019 | Fix some issues with the cuckoo filter (performances). Fix the global API. It allows now to customize each Filter. If you want to use the old API, use the `.create()` or `.from()` functions to match the old api. | | ||
| `v0.7.1` | 11/09/2019 | Add the Counting Bloom Filter | | ||
| `v0.7.0` | 01/07/2019 | Move to [XXHASH](https://cyan4973.github.io/xxHash/) for hashing elements in the library. One property has been added into the exported json `_seed` which is used to seed every hash of every elements. Update Invertible Bloom Filters with #add, #has, #delete, #listEntries, #substract, #Static.decode methods. Updated the way to get distinct indices which could have collisions in many cases. | | ||
| `v0.6.1` | 18/06/2019 | Add Invertible Bloom Filters (only #encode/#substract/#Static.decode methods) | | ||
**v1.0.0**: Rework the whole library using TypeScript, unify the API and fix the documentation. | ||
**v0.8.0**: Fix some issues with the cuckoo filter (performances). Fix the global API. It allows now to customize each Filter. If you want to use the old API, use the `.create()` or `.from()` functions to match the old api. | ||
**v0.7.1**: Add the Counting Bloom Filter. | ||
**v0.7.0** Move to [XXHASH](https://cyan4973.github.io/xxHash/) for hashing elements in the library. One property has been added into the exported json `_seed` which is used to seed every hash of every elements. Update Invertible Bloom Filters with #add, #has, #delete, #listEntries, #substract, #Static.decode methods. Updated the way to get distinct indices which could have collisions in many cases. | ||
**v0.6.1** Add Invertible Bloom Filters (only #encode/#substract/#Static.decode methods) | ||
## License | ||
[MIT License](https://github.com/Callidon/bloom-filters/blob/master/LICENSE) |
@@ -33,4 +33,6 @@ /* file : api.ts | ||
export { default as TopK } from './sketch/topk' | ||
export { MinHash } from './sketch/min-hash' | ||
export { default as MinHashFactory } from './sketch/min-hash-factory' | ||
export { default as CuckooFilter } from './cuckoo/cuckoo-filter' | ||
export { default as InvertibleBloomFilter } from './iblt/invertible-bloom-lookup-tables' | ||
export { default as Cell } from './iblt/cell' |
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
435170
77
9866
506