Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). For details about the algorithm and citations please use this article: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
Package gochroma provides a high-level API to the acoustic fingerprinting library chromaprint.
Package stopwords allows you to customize the list of stopwords Package stopwords implements the Levenshtein Distance algorithm to evaluate the diference between 2 strings Package stopwords implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. Package stopwords contains various algorithms of text comparison (Simhash, Levenshtein)
Package virgil is the pure Go implementation of Virgil Security compatible SDK Right now it supports only ed25519 keys and signatures and curve25519 key exchange As for symmetric crypto, it's AES256-GCM Hashes used are SHA-384 for signature and SHA-256 for fingerprints
Package perceptive implements perceptual hash algorithms for comparing images. Perceptual hash algorithms are a family of comparable hash functions which generate distinct (but not unique) fingerprints, these fingerprints are then comparable. Perceptual hash algorithms are mainly used for detecting duplicates of the same files, in a way that standard and cryptographic hashes generally fail. The following perceptual hash algorithms are implemented: - Average Hash (Ahash) - Fast but generates a huge number of false positives. - Difference Hash (Dhash) - Fast and very few false positives. Below are some examples on how to use the library: You can also use the perceptual hash algorithms directly, this is good if you want to store the hashes in a database or some look up table: When performing a Hamming distance on two hashes from Ahash or Dhash, the distance output has the following meaning: - A distance of 0 means that the images are likely the same. - A distance between 1-10 indicates the images are likely a variation of each other. - A distance greater than 10 indicates the images are likely different.
Package perceptive implements perceptual hash algorithms for comparing images. Perceptual hash algorithms are a family of comparable hash functions which generate distinct (but not unique) fingerprints, these fingerprints are then comparable. Perceptual hash algorithms are mainly used for detecting duplicates of the same files, in a way that standard and cryptographic hashes generally fail. The following perceptual hash algorithms are implemented: - Average Hash (Ahash) - Fast but generates a huge number of false positives. - Difference Hash (Dhash) - Fast and very few false positives. Below are some examples on how to use the library: You can also use the perceptual hash algorithms directly, this is good if you want to store the hashes in a database or some look up table: When performing a Hamming distance on two hashes from Ahash or Dhash, the distance output has the following meaning: - A distance of 0 means that the images are likely the same. - A distance between 1-10 indicates the images are likely a variation of each other. - A distance greater than 10 indicates the images are likely different.
simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. simhash fingerprints have the property that similar documents will have a similar fingerprint. Therefore, the hamming distance between two fingerprints will be small if the documents are similar
Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. simhash fingerprints have the property that similar documents will have a similar fingerprint. Therefore, the hamming distance between two fingerprints will be small if the documents are similar for standalone test, change package to main and the next func def to, func main() {
This implements Rabin fingerprinting using a fixed irreducible 64-bit polynomial. This implements Rabin fingerprinting using a fixed irreducible 64-bit polynomial.
Package rabin provides functions to calculate Rabin Fingerprints, using the Adler32 prime by default The algorithm used is: Where POW is (PRIME ** windowsize) % MOD However, the modulus operations can be turned into a bit mask if MOD is a power of two. So this implementation takes a shift instead of an integer.
© 2022 OSINT Fingerprint Project Contributors © 2022 OSINT Fingerprint Project Contributors
Package casper implements H2O's 1 CASPer (cache-aware server push) in golang. It wraps go's standard server push method and maintains a fingerprint of browser caches and decides to push or cancel. The fingerprint is generated by using golomb-coded sets (compressed encoding of Bloom filter). Below is a simple example of usage.
Package ja3 provides JA3 Client Fingerprinting for the Go language by looking at the TLS Client Hello packets. Basic Usage ja3 takes in TCP payload data as a []byte and computes the corresponding JA3 string and digest.
Package assets provides asset compilation, concatenation and fingerprinting.
Package stopwords allows you to customize the list of stopwords Package stopwords implements the Levenshtein Distance algorithm to evaluate the diference between 2 strings Package stopwords implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. Package stopwords contains various algorithms of text comparison (Simhash, Levenshtein)
implements the Levenshtein Distance algorithm to evaluate the diference between 2 strings implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. stopwords package removes most frequent words from a text content. It can be used to improve the accuracy of SimHash algo for example. It uses a list of most frequent words used in various languages : arabic, bulgarian, czech, danish, english, finnish, french, german, hungarian, italian, japanese, latvian, norwegian, persian, polish, It contains various algorithms of text comparisons (Simhash, Levenshtein)
Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). For details about the algorithm and citations please use this article: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
Package meekserver is the server transport plugin for the meek pluggable transport. It acts as an HTTP server, keeps track of session ids, and forwards received data to a local OR port. Sample usage in torrc: Using your own TLS certificate: Plain HTTP usage: The server runs in HTTPS mode by default, getting certificates from Let's Encrypt automatically. The server opens an auxiliary ACME listener on port 80 in order for the automatic certificates to work. If you have your own certificate, use the --cert and --key options. Use --disable-tls option to run with plain HTTP. Package meekserver provides an implementation of the Meek circumvention protocol. Only a client implementation is provided, and no effort is made to normalize the TLS fingerprint. It borrows quite liberally from the real meek-client code.
Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). For details about the algorithm and citations please use this article: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
Package synapse is a wrapper library for the Synapse API (https://docs.synapsefi.com) Instantiate client Enable logging & turn off developer mode (developer mode is true by default) Register Fingerprint Set an `IDEMPOTENCY_KEY` (for `POST` requests only) Submit optional query parameters
Package pwnedkeys looks up Certificates, Certificate requests, Keys, etc in the pwnedkeys.com database. Lookup is done using the SubjectPublicKeyInfo (SPKI) associated with a key. The SPKI fingerprint of a key (or certificate) is the all-lowercase hex-encoded SHA-256 hash of the DER-encoded form of the subjectPublicKeyInfo ASN.1 structure representing a given public key.
Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)
Package stopwords implements the Levenshtein Distance algorithm to evaluate the diference between 2 strings Package stopwords implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. Package stopwords contains various algorithms of text comparison (Simhash, Levenshtein)
Package cuckoofilter provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). For details about the algorithm and citations please use this article: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
A dynamic and extensible music library organizer Demlo is a music library organizer. It can encode, fix case, change folder hierarchy according to tags or file properties, tag from an online database, copy covers while ignoring duplicates or those below a quality threshold, and much more. It makes it possible to manage your libraries uniformly and dynamically. You can write your own rules to fit your needs best. Demlo aims at being as lightweight and portable as possible. Its major runtime dependency is the transcoder FFmpeg. The scripts are written in Lua for portability and speed while allowing virtually unlimited extensibility. Usage: For usage options, see: First Demlo creates a list of all input files. When a folder is specified, all files matching the extensions from the 'extensions' variable will be appended to the list. Identical files are appended only once. Next all files get analyzed: - The audio file details (tags, stream properties, format properties, etc.) are stored into the 'input' variable. The 'output' variable gets its default values from 'input', or from an index file if specified from command-line. If no index has been specified and if an attached cuesheet is found, all cuesheet details are appended accordingly. Cuesheet tags override stream tags, which override format tags. Finally, still without index, tags can be retrieved from Internet if the command-line option is set. - If a prescript has been specified, it gets executed. It makes it possible to adjust the input values and global variables before running the other scripts. - The scripts, if any, get executed in the lexicographic order of their basename. The 'output' variable is transformed accordingly. Scripts may contain rules such as defining a new file name, new tags, new encoding properties, etc. You can use conditions on input values to set the output properties, which makes it virtually possible to process a full music library in one single run. - If a postscript has been specified, it gets executed. It makes it possible to adjust the output of the script for the current run only. - Demlo makes some last-minute tweaking if need be: it adjusts the bitrate, the path, the encoding parameters, and so on. - A preview of changes is displayed. - When applying changes, the covers get copied if required and the audio file gets processed: tags are modified as specified, the file is re-encoded if required, and the output is written to the appropriate folder. When destination already exists, the 'exist' action is executed. The program's default behaviour can be changed from the user configuration file. (See the 'Files' section for a template.) Most command-line flags default value can be changed. The configuration file is loaded on startup, before parsing the command-line options. Review the default value of the CLI flags with 'demlo -h'. If you wish to use no configuration file, set the environment variable DEMLORC to ".". Scripts can contain any safe Lua code. Some functions like 'os.execute' are not available for security reasons. It is not possible to print to the standard output/error unless running in debug mode and using the 'debug' function. See the 'sandbox.go' file for a list of allowed functions and variables. Lua patterns are replaced by Go regexps. See https://github.com/google/re2/wiki/Syntax. Scripts have no requirements at all. However, to be useful, they should set values of the 'output' table detailed in the 'Variables' section. You can use the full power of the Lua to set the variables dynamically. For instance: 'input' and 'output' are both accessible from any script. All default functions and variables (excluding 'output') are reset on every script call to enforce consistency. Local variables are lost from one script call to another. Global variables are preserved. Use this feature to pass data like options or new functions. 'output' structure consistency is guaranteed at the start of every script. Demlo will only extract the fields with the right type as described in the 'Variables' section. Warning: Do not abuse of global variables, especially when processing non-fixed size data (e.g. tables). Data could grow big and slow down the program. By default, when the destination exists, Demlo will append a suffix to the output destination. This behaviour can be changed from the 'exist' action specified by the user. Demlo comes with a few default actions. The 'exist' action works just like scripts with the following differences: - Any change to 'output.path' will be skipped. - An additional variable is accessible from the action: 'existinfo' holds the file details of the existing files in the same fashion as 'input'. This allows for comparing the input file and the existing destination. The writing rules can be tweaked the following way: Word of caution: overwriting breaks Demlo's rule of not altering existing files. It can lead to undesired results if the overwritten file is also part of the (yet to be processed) input. The overwrite capability can be useful when syncing music libraries however. The user scripts should be generic. Therefore they may not properly handle some uncommon input values. Tweak the input with temporary overrides from command-line. The prescript and postscript defined on command-line will let you run arbitrary code that is run before and after all other scripts, respectively. Use global variables to transfer data and parameters along. If the prescript and postscript end up being too long, consider writing a demlo script. You can also define shell aliases or use wrapper scripts as convenience. The 'input' table describes the file: Bitrate is in bits per seconds (bps). That is, for 320 kbps you would specify The 'time' is the modification time of the file. It holds the sec seconds and nsec nanoseconds since January 1, 1970 UTC. The entry 'streams' and 'format' are as returned by It gives access to most metadata that FFmpeg can return. For instance, to get the duration of the track in seconds, query the variable 'input.format.duration'. Since there may be more than one stream (covers, other data), the first audio stream is assumed to be the music stream. For convenience, the index of the music stream is stored in 'audioindex'. The tags returned by FFmpeg are found in streams, format and in the cuesheet. To make tag queries easier, all tags are stored in the 'tags' table, with the following precedence: You can remove a tag by setting it to 'nil' or the empty string. This is equivalent, except that 'nil' saves some memory during the process. The 'output' table describes the transformation to apply to the file: The 'parameters' array holds the CLI parameters passed to FFmpeg. It can be anything supported by FFmpeg, although this variable is supposed to hold encoding information. See the 'Examples' section. The 'embeddedcovers', 'externalcovers' and 'onlinecover' variables are detailed in the 'Covers' section. The 'write' variable is covered in the 'Existing destination' section. The 'rmsrc' variable is a boolean: when true, Demlo removes the source file after processing. This can speed up the process when not re-encoding. This option is ignored for multi-track files. For convenience, the following shortcuts are provided: Demlo provides some non-standard Lua functions to ease scripting. Display a message on stderr if debug mode is on. Return lowercase string without non-alphanumeric characters nor leading zeros. Return the relation coefficient of the two input strings. The result is a float in 0.0...1.0, 0.0 means no relation at all, 1.0 means identical strings. A format is a container in FFmpeg's terminology. 'output.parameters' contains CLI flags passed to FFmpeg. They are meant to set the stream codec, the bitrate, etc. If 'output.parameters' is {'-c:a', 'copy'} and the format is identical, then taglib will be used instead of FFmpeg. Use this rule from a (post)script to disable encoding by setting the same format and the copy parameters. This speeds up the process. The official scripts are usually very smart at guessing the right values. They might make mistakes however. If you are unsure, you can (and you are advised to) preview the results before proceeding. The 'diff' preview is printed to stderr. A JSON preview of the changes is printed to stdout if stdout is redirected. The initial values of the 'output' table can be completed with tags fetched from the MusicBrainz database. Audio files are fingerprinted for the queries, so even with initially wrong file names and tags, the right values should still be retrieved. The front album cover can also be retrieved. Proxy parameters will be fetched automatically from the 'http_proxy' and 'https_proxy' environment variables. As this process requires network access it can be quite slow. Nevertheless, Demlo is specifically optimized for albums, so that network queries are used for only one track per album, when possible. Some tracks can be released on different albums: Demlo tries to guess it from the tags, but if the tags are wrong there is no way to know which one it is. There is a case where the selection can be controlled: let's assume we have tracks A, B and C from the same album Z. A and B were also released in album Y, whereas C was release in Z only. Tags for A will be checked online; let's assume it gets tagged to album Y. B will use A details, so album Y too. Then C does not match neither A's nor B's album, so another online query will be made and it will be tagged to album Z. This is slow and does not yield the expected result. Now let's call Tags for C will be queried online, and C will be tagged to Z. Then both A and B will match album Z so they will be tagged using C details, which is the desired result. Conclusion: when using online tagging, the first argument should be the lesser known track of the album. Demlo can set the output variables according to the values set in a text file before calling the script. The input values are ignored as well as online tagging, but it is still possible to access the input table from scripts. This 'index' file is formatted in JSON. It corresponds to what Demlo outputs when printing the JSON preview. This is valid JSON except for the missing beginning and the missing end. It makes it possible to concatenate and to append to existing index files. Demlo will automatically complete the missing parts so that it becomes valid JSON. The index file is useful when you want to edit tags manually: You can redirect the output to a file, edit the content manually with your favorite text editor, then run Demlo again with the index as argument. See the 'Examples' section. This feature can also be used to interface Demlo with other programs. Demlo can manage embedded covers as well as external covers. External covers are queried from files matching known extensions in the file's folder. Embedded covers are queried from static video streams in the file. Covers are accessed from The embedded covers are indexed numerically by order of appearance in the streams. The first cover will be at index 1 and so on. This is not necessarily the index of the stream. 'inputcover' is the following structure: 'format' is the picture format. FFmpeg makes a distinction between format and codec, but it is not useful for covers. The name of the format is specified by Demlo, not by FFmpeg. Hence the 'jpeg' name, instead of 'mjpeg' as FFmpeg puts it. 'width' and 'height' hold the size in pixels. 'checksum' can be used to identify files uniquely. For performance reasons, only a partial checksum is performed. This variable is typically used for skipping duplicates. Cover transformations are specified in 'outputcover' has the following structure: The format is specified by FFmpeg this time. See the comments on 'format' for 'inputcover'. 'parameters' is used in the same fashion as 'output.parameters'. User configuration: This must be a Lua file. See the 'demlorc' file provided with this package for an exhaustive list of options. Folder containing the official scripts: User script folder: Create this folder and add your own scripts inside. This folder takes precedence over the system folder, so scripts with the same name will be found in the user folder first. The following examples will not proceed unless the '-p' command-line option is true. Important: you _must_ use single quotes for the runtime Lua command to prevent expansion. Inside the Lua code, use double quotes for strings and escape single quotes. Show default options: Preview changes made by the default scripts: Use 'alternate' script if found in user or system script folder (user folder first): Add the Lua file to the list of scripts. This feature is convenient if you want to write scripts that are too complex to fit on the command-line, but not generic enough to fit the user or system script folders. Remove all script from the list, then add '30-case' and '60-path' scripts. Note that '30-case' will be run before '60-path'. Do not use any script but '60-path'. The file content is unchanged and the file is renamed to a dynamically computed destination. Demlo performs an instant rename if destination is on the same device. Otherwise it copies the file and removes the source. Use the default scripts (if set in configuration file), but do not re-encode: Set 'artist' to the value of 'composer', and 'title' to be preceded by the new value of 'artist', then apply the default script. Do not re-encode. Order in runtime script matters. Mind the double quotes. Set track number to first number in input file name: Use the default scripts but keep original value for the 'artist' tag: 1) Preview default scripts transformation and save it to an index. 2) Edit file to fix any potential mistake. 3) Run Demlo over the same files using the index information only. Same as above but generate output filename according to the custom '61-rename' script. The numeric prefix is important: it ensures that '61-rename' will be run after all the default tag related scripts and after '60-path'. Otherwise, if a change in tags would occur later on, it would not affect the renaming script. Retrieve tags from Internet: Same as above but for a whole album, and saving the result to an index: Only download the cover for the album corresponding to the track. Use 'rmsrc' to avoid duplicating the audio file. Change tags inplace with entries from MusicBrainz: Set tags to titlecase while casing AC-DC correctly: To easily switch between formats from command-line, create one script per format (see 50-encoding.lua), e.g. ogg.lua and flac.lua. Then Add support for non-default formats from CLI: Overwrite existing destination if input is newer: ffmpeg(1), ffprobe(1), http://www.lua.org/pil/contents.html
simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. simhash fingerprints have the property that similar documents will have a similar fingerprint. Therefore, the hamming distance between two fingerprints will be small if the documents are similar
Package stopwords allows you to customize the list of stopwords Package stopwords implements the Levenshtein Distance algorithm to evaluate the diference between 2 strings Package stopwords implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. Package stopwords contains various algorithms of text comparison (Simhash, Levenshtein)
Package sshfp implements a ssh.HostKeyCallback for resolving SSH host key fingerprints using DNS The most basic resolver is created as follows (without error checking):
Package stopwords allows you to customize the list of stopwords Package stopwords implements the Levenshtein Distance algorithm to evaluate the diference between 2 strings Package stopwords implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. Package stopwords contains various algorithms of text comparison (Simhash, Levenshtein)
Package stopwords allows you to customize the list of stopwords Package stopwords implements the Levenshtein Distance algorithm to evaluate the diference between 2 strings Package stopwords implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. Package stopwords contains various algorithms of text comparison (Simhash, Levenshtein)
Package synapse is a wrapper library for the Synapse API (https://docs.synapsefi.com) Instantiate client Enable logging & turn off developer mode (developer mode is true by default) Register Fingerprint Set an `IDEMPOTENCY_KEY` (for `POST` requests only) Submit optional query parameters
Package ja3 provides JA3 Client Fingerprinting for the Go language by looking at the TLS Client Hello packets. Basic Usage ja3 takes in TCP payload data as a []byte and computes the corresponding JA3 string and digest.
dedup scans files or directories and calculates fingerprint hashes for them based on their contents. Originally written on a plane from SFO->EWR on 7-23-15 in about an hour. Based on an idea I had been mulling in my mind for years. Without the -d (directory) switch dedup recursively scans the supplied directories and files in depth first order and records the hash of each file in a map of slices keyed by the hash. After the scan is complete, the resulting map is iterated, and if any of the slices have a length of more than 1, then all the files on that slice are all duplicates of each other. If -d switch is supplied the hashes of files in each directory are themselves recursively hashed and the resulting fingerprints for each directory (but not the files) are recorded in the map. Again, if the length of any slice is more than 1, then the entire directory is duplicated. The -d switch works with more than two directories, but sometimes not as well. If the -r switch is supplied, reverses the sense of the program and files or directories that ARE NOT duplicated are printed. When the map is scanned, any slices with a length different than the number of supplied directories are printed as these represent missing files. This allows directories to be easily compared and more than two can easily be compared. Even cooler is that the program works even if files or directories have been renamed. Without a switch to print, no output is generated. The -p prints out the pathnames of duplicated or missing files o directories. The -ps prints a summary of the number of files or dir that were duplicated and now much space they take up. The F, S, H, L, and N switches print the fingerprint, size, human readable size, hash chain length, and number of roots respectively. Examples % dedup -p ~/Desktop % dedup -d -p dir1 dir2 % dedup -d -r -p dir1 dir2 The hash used is the asehash from the Go runtime. It's fast and passes smhahser. The map of slices is not the most memory efficient representation and at some point it probably makes sense to switch to a cuckoo hash table.
Taken from $GOROOT/src/pkg/net/http/chunked needed to write https responses to client. Package goproxy provides a customizable HTTP proxy, supporting hijacking HTTPS connection. The intent of the proxy, is to be usable with reasonable amount of traffic yet, customizable and programable. The proxy itself is simply an `net/http` handler. Typical usage is Adding a header to each request For printing the content type of all incoming responses note that we used the ProxyCtx context variable here. It contains the request and the response (Req and Resp, Resp is nil if unavailable) of this specific client interaction with the proxy. To print the content type of all responses from a certain url, we'll add a ReqCondition to the OnResponse function: We can write the condition ourselves, conditions can be set on request and on response Caution! If you give a RespCondition to the OnRequest function, you'll get a run time panic! It doesn't make sense to read the response, if you still haven't got it! Finally, we have convenience function to throw a quick response we close the body of the original repsonse, and return a new 403 response with a short message. Example use cases: 1. https://github.com/elazarl/goproxy/tree/master/examples/goproxy-avgsize To measure the average size of an Html served in your site. One can ask all the QA team to access the website by a proxy, and the proxy will measure the average size of all text/html responses from your host. 2. [not yet implemented] All requests to your web servers should be directed through the proxy, when the proxy will detect html pieces sent as a response to AJAX request, it'll send a warning email. 3. https://github.com/elazarl/goproxy/blob/master/examples/goproxy-httpdump/ Generate a real traffic to your website by real users using through proxy. Record the traffic, and try it again for more real load testing. 4. https://github.com/elazarl/goproxy/tree/master/examples/goproxy-no-reddit-at-worktime Will allow browsing to reddit.com between 8:00am and 17:00pm 5. https://github.com/elazarl/goproxy/tree/master/examples/goproxy-jquery-version Will warn if multiple versions of jquery are used in the same domain. 6. https://github.com/elazarl/goproxy/blob/master/examples/goproxy-upside-down-ternet/ Modifies image files in an HTTP response via goproxy's image extension found in ext/. Support code for TLS camouflage using uTLS. The goal is: provide an http.RoundTripper abstraction that retains the features of http.Transport (e.g., persistent connections and HTTP/2 support), while making TLS connections using uTLS in place of crypto/tls. The challenge is: while http.Transport provides a DialTLS hook, setting it to non-nil disables automatic HTTP/2 support in the client. Most of the uTLS fingerprints contain an ALPN extension containing "h2"; i.e., they declare support for HTTP/2. If the server also supports HTTP/2, then uTLS may negotiate an HTTP/2 connection without the http.Transport knowing it, which leads to an HTTP/1.1 client speaking to an HTTP/2 server, a protocol error. The code here uses an idea adapted from meek_lite in obfs4proxy: https://gitlab.com/yawning/obfs4/commit/4d453dab2120082b00bf6e63ab4aaeeda6b8d8a3 Instead of setting DialTLS on an http.Transport and exposing it directly, we expose a wrapper type, UTLSRoundTripper, that contains within it either an http.Transport or an http2.Transport. The first time a caller calls RoundTrip on the wrapper, we initiate a uTLS connection (bootstrapConn), then peek at the ALPN-negotiated protocol: if "h2", create an internal http2.Transport; otherwise, create an internal http.Transport. In either case, set DialTLS on the created Transport to a function that dials using uTLS. As a special case, the first time the DialTLS callback is called, it reuses bootstrapConn (the one made to peek at the ALPN), rather than make a new connection. Subsequent calls to RoundTripper on the wrapper just pass the requests though the previously created http.Transport or http2.Transport. We assume that in future RoundTrips, the ALPN-negotiated protocol will remain the same as it was in the initial RoundTrip. At this point it is the http.Transport or http2.Transport calling DialTLS, not us, so we can't dynamically swap the underlying transport based on the ALPN. https://bugs.torproject.org/29077 https://github.com/refraction-networking/utls/issues/16
Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). For details about the algorithm and citations please use this article: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
A dynamic and extensible music library organizer Demlo is a music library organizer. It can encode, fix case, change folder hierarchy according to tags or file properties, tag from an online database, copy covers while ignoring duplicates or those below a quality threshold, and much more. It makes it possible to manage your libraries uniformly and dynamically. You can write your own rules to fit your needs best. Demlo aims at being as lightweight and portable as possible. Its major runtime dependency is the transcoder FFmpeg. The scripts are written in Lua for portability and speed while allowing virtually unlimited extensibility. Usage: For usage options, see: First Demlo creates a list of all input files. When a folder is specified, all files matching the extensions from the 'extensions' variable will be appended to the list. Identical files are appended only once. Next all files get analyzed: - The audio file details (tags, stream properties, format properties, etc.) are stored into the 'input' variable. The 'output' variable gets its default values from 'input', or from an index file if specified from command-line. If no index has been specified and if an attached cuesheet is found, all cuesheet details are appended accordingly. Cuesheet tags override stream tags, which override format tags. Finally, still without index, tags can be retrieved from Internet if the command-line option is set. - If a prescript has been specified, it gets executed. It makes it possible to adjust the input values and global variables before running the other scripts. - The scripts, if any, get executed in the lexicographic order of their basename. The 'output' variable is transformed accordingly. Scripts may contain rules such as defining a new file name, new tags, new encoding properties, etc. You can use conditions on input values to set the output properties, which makes it virtually possible to process a full music library in one single run. - If a postscript has been specified, it gets executed. It makes it possible to adjust the output of the script for the current run only. - Demlo makes some last-minute tweaking if need be: it adjusts the bitrate, the path, the encoding parameters, and so on. - A preview of changes is displayed. - When applying changes, the covers get copied if required and the audio file gets processed: tags are modified as specified, the file is re-encoded if required, and the output is written to the appropriate folder. When destination already exists, the 'exist' action is executed. The program's default behaviour can be changed from the user configuration file. (See the 'Files' section for a template.) Most command-line flags default value can be changed. The configuration file is loaded on startup, before parsing the command-line options. Review the default value of the CLI flags with 'demlo -h'. If you wish to use no configuration file, set the environment variable DEMLORC to ".". Scripts can contain any safe Lua code. Some functions like 'os.execute' are not available for security reasons. It is not possible to print to the standard output/error unless running in debug mode and using the 'debug' function. See the 'sandbox.go' file for a list of allowed functions and variables. Lua patterns are replaced by Go regexps. See https://github.com/google/re2/wiki/Syntax. Scripts have no requirements at all. However, to be useful, they should set values of the 'output' table detailed in the 'Variables' section. You can use the full power of the Lua to set the variables dynamically. For instance: 'input' and 'output' are both accessible from any script. All default functions and variables (excluding 'output') are reset on every script call to enforce consistency. Local variables are lost from one script call to another. Global variables are preserved. Use this feature to pass data like options or new functions. 'output' structure consistency is guaranteed at the start of every script. Demlo will only extract the fields with the right type as described in the 'Variables' section. Warning: Do not abuse of global variables, especially when processing non-fixed size data (e.g. tables). Data could grow big and slow down the program. By default, when the destination exists, Demlo will append a suffix to the output destination. This behaviour can be changed from the 'exist' action specified by the user. Demlo comes with a few default actions. The 'exist' action works just like scripts with the following differences: - Any change to 'output.path' will be skipped. - An additional variable is accessible from the action: 'existinfo' holds the file details of the existing files in the same fashion as 'input'. This allows for comparing the input file and the existing destination. The writing rules can be tweaked the following way: Word of caution: overwriting breaks Demlo's rule of not altering existing files. It can lead to undesired results if the overwritten file is also part of the (yet to be processed) input. The overwrite capability can be useful when syncing music libraries however. The user scripts should be generic. Therefore they may not properly handle some uncommon input values. Tweak the input with temporary overrides from command-line. The prescript and postscript defined on command-line will let you run arbitrary code that is run before and after all other scripts, respectively. Use global variables to transfer data and parameters along. If the prescript and postscript end up being too long, consider writing a demlo script. You can also define shell aliases or use wrapper scripts as convenience. The 'input' table describes the file: Bitrate is in bits per seconds (bps). That is, for 320 kbps you would specify The 'time' is the modification time of the file. It holds the sec seconds and nsec nanoseconds since January 1, 1970 UTC. The entry 'streams' and 'format' are as returned by It gives access to most metadata that FFmpeg can return. For instance, to get the duration of the track in seconds, query the variable 'input.format.duration'. Since there may be more than one stream (covers, other data), the first audio stream is assumed to be the music stream. For convenience, the index of the music stream is stored in 'audioindex'. The tags returned by FFmpeg are found in streams, format and in the cuesheet. To make tag queries easier, all tags are stored in the 'tags' table, with the following precedence: You can remove a tag by setting it to 'nil' or the empty string. This is equivalent, except that 'nil' saves some memory during the process. The 'output' table describes the transformation to apply to the file: The 'parameters' array holds the CLI parameters passed to FFmpeg. It can be anything supported by FFmpeg, although this variable is supposed to hold encoding information. See the 'Examples' section. The 'embeddedcovers', 'externalcovers' and 'onlinecover' variables are detailed in the 'Covers' section. The 'write' variable is covered in the 'Existing destination' section. The 'rmsrc' variable is a boolean: when true, Demlo removes the source file after processing. This can speed up the process when not re-encoding. This option is ignored for multi-track files. For convenience, the following shortcuts are provided: Demlo provides some non-standard Lua functions to ease scripting. Display a message on stderr if debug mode is on. Return lowercase string without non-alphanumeric characters nor leading zeros. Return the relation coefficient of the two input strings. The result is a float in 0.0...1.0, 0.0 means no relation at all, 1.0 means identical strings. A format is a container in FFmpeg's terminology. 'output.parameters' contains CLI flags passed to FFmpeg. They are meant to set the stream codec, the bitrate, etc. If 'output.parameters' is {'-c:a', 'copy'} and the format is identical, then taglib will be used instead of FFmpeg. Use this rule from a (post)script to disable encoding by setting the same format and the copy parameters. This speeds up the process. The official scripts are usually very smart at guessing the right values. They might make mistakes however. If you are unsure, you can (and you are advised to) preview the results before proceeding. The 'diff' preview is printed to stderr. A JSON preview of the changes is printed to stdout if stdout is redirected. The initial values of the 'output' table can be completed with tags fetched from the MusicBrainz database. Audio files are fingerprinted for the queries, so even with initially wrong file names and tags, the right values should still be retrieved. The front album cover can also be retrieved. Proxy parameters will be fetched automatically from the 'http_proxy' and 'https_proxy' environment variables. As this process requires network access it can be quite slow. Nevertheless, Demlo is specifically optimized for albums, so that network queries are used for only one track per album, when possible. Some tracks can be released on different albums: Demlo tries to guess it from the tags, but if the tags are wrong there is no way to know which one it is. There is a case where the selection can be controlled: let's assume we have tracks A, B and C from the same album Z. A and B were also released in album Y, whereas C was release in Z only. Tags for A will be checked online; let's assume it gets tagged to album Y. B will use A details, so album Y too. Then C does not match neither A's nor B's album, so another online query will be made and it will be tagged to album Z. This is slow and does not yield the expected result. Now let's call Tags for C will be queried online, and C will be tagged to Z. Then both A and B will match album Z so they will be tagged using C details, which is the desired result. Conclusion: when using online tagging, the first argument should be the lesser known track of the album. Demlo can set the output variables according to the values set in a text file before calling the script. The input values are ignored as well as online tagging, but it is still possible to access the input table from scripts. This 'index' file is formatted in JSON. It corresponds to what Demlo outputs when printing the JSON preview. This is valid JSON except for the missing beginning and the missing end. It makes it possible to concatenate and to append to existing index files. Demlo will automatically complete the missing parts so that it becomes valid JSON. The index file is useful when you want to edit tags manually: You can redirect the output to a file, edit the content manually with your favorite text editor, then run Demlo again with the index as argument. See the 'Examples' section. This feature can also be used to interface Demlo with other programs. Demlo can manage embedded covers as well as external covers. External covers are queried from files matching known extensions in the file's folder. Embedded covers are queried from static video streams in the file. Covers are accessed from The embedded covers are indexed numerically by order of appearance in the streams. The first cover will be at index 1 and so on. This is not necessarily the index of the stream. 'inputcover' is the following structure: 'format' is the picture format. FFmpeg makes a distinction between format and codec, but it is not useful for covers. The name of the format is specified by Demlo, not by FFmpeg. Hence the 'jpeg' name, instead of 'mjpeg' as FFmpeg puts it. 'width' and 'height' hold the size in pixels. 'checksum' can be used to identify files uniquely. For performance reasons, only a partial checksum is performed. This variable is typically used for skipping duplicates. Cover transformations are specified in 'outputcover' has the following structure: The format is specified by FFmpeg this time. See the comments on 'format' for 'inputcover'. 'parameters' is used in the same fashion as 'output.parameters'. User configuration: This must be a Lua file. See the 'demlorc' file provided with this package for an exhaustive list of options. Folder containing the official scripts: User script folder: Create this folder and add your own scripts inside. This folder takes precedence over the system folder, so scripts with the same name will be found in the user folder first. The following examples will not proceed unless the '-p' command-line option is true. Important: you _must_ use single quotes for the runtime Lua command to prevent expansion. Inside the Lua code, use double quotes for strings and escape single quotes. Show default options: Preview changes made by the default scripts: Use 'alternate' script if found in user or system script folder (user folder first): Add the Lua file to the list of scripts. This feature is convenient if you want to write scripts that are too complex to fit on the command-line, but not generic enough to fit the user or system script folders. Remove all script from the list, then add '30-case' and '60-path' scripts. Note that '30-case' will be run before '60-path'. Do not use any script but '60-path'. The file content is unchanged and the file is renamed to a dynamically computed destination. Demlo performs an instant rename if destination is on the same device. Otherwise it copies the file and removes the source. Use the default scripts (if set in configuration file), but do not re-encode: Set 'artist' to the value of 'composer', and 'title' to be preceded by the new value of 'artist', then apply the default script. Do not re-encode. Order in runtime script matters. Mind the double quotes. Set track number to first number in input file name: Use the default scripts but keep original value for the 'artist' tag: 1) Preview default scripts transformation and save it to an index. 2) Edit file to fix any potential mistake. 3) Run Demlo over the same files using the index information only. Same as above but generate output filename according to the custom '61-rename' script. The numeric prefix is important: it ensures that '61-rename' will be run after all the default tag related scripts and after '60-path'. Otherwise, if a change in tags would occur later on, it would not affect the renaming script. Retrieve tags from Internet: Same as above but for a whole album, and saving the result to an index: Only download the cover for the album corresponding to the track. Use 'rmsrc' to avoid duplicating the audio file. Change tags inplace with entries from MusicBrainz: Set tags to titlecase while casing AC-DC correctly: To easily switch between formats from command-line, create one script per format (see 50-encoding.lua), e.g. ogg.lua and flac.lua. Then Add support for non-default formats from CLI: Overwrite existing destination if input is newer: ffmpeg(1), ffprobe(1), http://www.lua.org/pil/contents.html
Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). For details about the algorithm and citations please use this article: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). For details about the algorithm and citations please use this article: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
© 2022 OSINT Fingerprint Project Contributors
© 2022 OSINT Fingerprint Project Contributors
simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. simhash fingerprints have the property that similar documents will have a similar fingerprint. Therefore, the hamming distance between two fingerprints will be small if the documents are similar