Huge news!Announcing our $20M Series A led by Andreessen Horowitz.Learn more →

v3.3.6+incompatible •

Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the ﬂexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom ﬁlters, for applications that require low false positive rates (< 3%). For details about the algorithm and citations please use this article: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.

v0.0.0-20220411075957-e3b120b3f5fb •

v4.1.0+incompatible •

v4.0.1+incompatible •

Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf

v0.4.0 •

simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. simhash fingerprints have the property that similar documents will have a similar fingerprint. Therefore, the hamming distance between two fingerprints will be small if the documents are similar

v0.0.0-20151007195837-79f94a1100d6 •

v0.0.0-20210908011315-3e6d59d1cb98 •

v0.0.0-20210908011315-3e6d59d1cb98 •

Package stopwords allows you to customize the list of stopwords Package stopwords implements the Levenshtein Distance algorithm to evaluate the diference between 2 strings Package stopwords implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. Package stopwords contains various algorithms of text comparison (Simhash, Levenshtein)

v1.0.0 •

v2.0.0-beta.3+incompatible •

v1.1.1 •

Package gochroma provides a high-level API to the acoustic fingerprinting library chromaprint.

v0.0.0-20211004000611-a294aa5ccab6 •

Package ja3 provides JA3 Client Fingerprinting for the Go language by looking at the TLS Client Hello packets. Basic Usage ja3 takes in TCP payload data as a []byte and computes the corresponding JA3 string and digest.

v1.0.1 •

Package fingerprint provides functionality to calculate, compare and analyse acoustic fingerprints of raw audio data. According to Wikipedia, acoustic fingerprint is a condensed digital summary, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database. Installation You should also install any package containing the implementation of any fingerprinting algoritms. Currently only bindings to chromaprint library are supported. Usage

v0.0.0-20140803133125-29397256b7ff •

simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. simhash fingerprints have the property that similar documents will have a similar fingerprint. Therefore, the hamming distance between two fingerprints will be small if the documents are similar for standalone test, change package to `main` and the next func def to, func main() {

v0.0.0-20170904020510-9ecaca7b509c •

Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf

v0.0.0-20181014151217-fe64bd25879f •

v1.1.1 •

v0.0.0-20210823080343-88bd7d5ca04f •

Package static provides a handler for static file serving with cache control and automatic fingerprinting.

v1.0.0 •

v0.0.0-20210829165900-5ef3f6c0be48 •

v0.0.0-20230109184621-aead687028ef •

v0.0.0-20210908011315-3e6d59d1cb98 •