Package fingerprint provides functionality to calculate, compare and analyse acoustic fingerprints of raw audio data. According to Wikipedia, acoustic fingerprint is a condensed digital summary, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database. Installation You should also install any package containing the implementation of any fingerprinting algoritms. Currently only bindings to chromaprint library are supported. Usage
Package static provides a handler for static file serving with cache control and automatic fingerprinting.
Package comet implements a BM25-based full-text search index. WHAT IS BM25? BM25 (Best Matching 25) is a probabilistic ranking function used to estimate the relevance of documents to a given search query. It is one of the most widely used ranking functions in information retrieval. HOW BM25 WORKS: For a given query Q with terms {t1, t2, ..., tn} and document D: 1. Tokenizes and normalizes both query and documents using UAX#29 word segmentation 2. For each query term, calculates: 3. Final score is the sum of (IDF Ć TF) for all query terms KEY PARAMETERS: TIME COMPLEXITY: MEMORY REQUIREMENTS: - Stores inverted index (term -> docIDs) using roaring bitmaps for compression - Stores term frequencies (term -> docID -> count) - Stores document lengths and tokens (not full text) - Much more memory efficient than storing full document text GUARANTEES & TRADE-OFFS: ā Pros: ā Cons: WHEN TO USE: Use BM25 index when: 1. You need full-text search with relevance ranking 2. You want fast keyword-based search 3. Memory efficiency is important (vs storing full text) 4. You have your own document store and just need search Package comet provides a high-performance hybrid vector search library for Go. Comet combines multiple indexing strategies and search modalities into a unified, efficient package. It supports semantic search (vector embeddings), full-text search (BM25), metadata filtering, and hybrid search with score fusion. Comet is built for developers who want to understand how vector databases work from the inside out. It provides production-ready implementations of modern vector search algorithms with comprehensive documentation and examples. Create a vector index and perform similarity search: Comet provides five vector index implementations, each with different tradeoffs: FlatIndex: Brute-force exhaustive search with 100% recall. Best for small datasets (<10K vectors) or when perfect accuracy is required. HNSWIndex: Hierarchical graph-based search with 95-99% recall and O(log n) performance. Best for most production workloads (10K-10M vectors). IVFIndex: Inverted file index using k-means clustering with 85-95% recall. Best for large datasets (>100K vectors) with moderate accuracy requirements. PQIndex: Product quantization for massive memory compression (10-500x smaller) with 70-85% recall. Best for memory-constrained environments. IVFPQIndex: Combines IVF and PQ for maximum scalability with 70-90% recall. Best for billion-scale datasets. Three distance metrics are supported: Euclidean (L2): Measures absolute spatial distance. Use when magnitude matters. L2Squared: Squared Euclidean distance (faster, preserves ordering). Use for better performance when only relative distances matter. Cosine: Measures angular similarity, independent of magnitude. Use for normalized vectors like text embeddings. BM25-based full-text search with Unicode tokenization: Fast filtering using Roaring Bitmaps and Bit-Sliced Indexes: Combine vector, text, and metadata search with score fusion: When combining results from multiple search modalities, different fusion strategies are available: WeightedSumFusion: Linear combination with configurable weights ReciprocalRankFusion: Rank-based fusion (scale-independent, recommended) MaxFusion/MinFusion: Simple maximum or minimum across modalities All indexes support persistence: HNSW parameters for tuning search quality: IVF parameters for tuning speed/accuracy: All indexes are safe for concurrent use. Multiple goroutines can search simultaneously while one goroutine adds or removes vectors. Document Search: Use vector embeddings for semantic search in documentation, knowledge bases, or content management systems. Product Recommendations: Combine product image embeddings with metadata filters for personalized recommendations. Question Answering: Use hybrid search (vector + BM25) for retrieval-augmented generation (RAG) systems. Duplicate Detection: Use high-recall vector search to find near-duplicate documents or images. Multi-modal Search: Combine text, image embeddings, and structured metadata for comprehensive search experiences. Choose the right index type: Use appropriate distance metrics: Batch operations: Training indexes: Metadata filtering: For detailed API documentation, see the godoc comments on each type and function. For more examples and use cases, visit: https://github.com/wizenheimer/comet Package comet implements a k-Nearest Neighbors (kNN) flat index for similarity search. WHAT IS A FLAT INDEX? A flat index is the most naive and simple approach to similarity search. The term "flat" indicates that vectors are stored without any compression or transformation - they are stored "as-is" in their original form. This is also known as brute-force or exhaustive search. HOW kNN WORKS: For a given query vector Q, the algorithm: 1. Calculates the distance from Q to EVERY vector in the dataset 2. Sorts all distances 3. Returns the k vectors with the smallest distances TIME COMPLEXITY: MEMORY REQUIREMENTS: - 4 bytes per float32 component - Total per vector: 4 * d bytes (where d is the dimensionality) - No compression, so memory scales linearly with dataset size GUARANTEES & TRADE-OFFS: ā Pros: ā Cons: WHEN TO USE: Use flat index only when: 1. Dataset size or embedding dimensionality is relatively small 2. You MUST have 100% accuracy (e.g., fingerprint matching, security applications) 3. Speed is not a critical concern Package comet implements HNSW (Hierarchical Navigable Small World). WHAT IS HNSW? HNSW is a state-of-the-art graph-based algorithm for approximate nearest neighbor search. It builds a multi-layered graph where search is O(log n) - incredibly fast! Layer 2: Few nodes, long-range connections (highways) Layer 1: More nodes, medium-range connections (state roads) Layer 0: All nodes, short-range connections (local streets) Search starts at top layer and descends, getting more refined at each level! PERFORMANCE: TIME COMPLEXITY: Package comet implements a hybrid search index that combines vector, text, and metadata search. WHAT IS HYBRIDSEARCHINDEX? HybridSearchIndex is a facade that provides a unified interface over three specialized indexes: 1. VectorIndex: For semantic similarity search using vector embeddings 2. TextIndex: For keyword-based BM25 full-text search 3. MetadataIndex: For filtering by structured metadata attributes HOW IT WORKS: The index maintains three separate indexes internally and coordinates search across them. When searching, it follows this flow: 1. Apply metadata filters first (if any) to get candidate document IDs 2. Pass candidate IDs to vector and/or text search for relevance ranking 3. Combine results from multiple search modes using score aggregation SEARCH MODES: - Vector-only: Semantic similarity search using embeddings - Text-only: Keyword-based BM25 search - Metadata-only: Pure filtering without ranking - Hybrid: Combine any or all of the above with score aggregation WHEN TO USE: Use HybridSearchIndex when: 1. You need to combine multiple search modalities 2. You want to filter by metadata before expensive vector search 3. You need both semantic and keyword-based search 4. You want a simple unified API instead of managing multiple indexes Package comet implements a k-Nearest Neighbors (kNN) IVF index for similarity search. WHAT IS AN IVF INDEX? IVF (Inverted File Index) is a partitioning-based approximate nearest neighbor search algorithm. It divides the vector space into Voronoi cells using k-means clustering, then searches only the nearest cells instead of scanning all vectors. HOW IVF WORKS: Training Phase: 1. Run k-means on training vectors to learn nlist cluster centroids 2. These centroids define Voronoi partitions of the vector space Indexing Phase: 1. For each vector, find its nearest centroid 2. Add the vector to that centroid's inverted list Search Phase: 1. Find the nprobe nearest centroids to the query vector 2. Search only the vectors in those nprobe inverted lists 3. Return the top-k nearest neighbors from candidates TIME COMPLEXITY: MEMORY REQUIREMENTS: - Vectors: 4 Ć n Ć dim bytes (stored as-is) - Centroids: 4 Ć nlist Ć dim bytes - Lists: negligible overhead (just pointers) - Total: ~4 Ć (n + nlist) Ć dim bytes ACCURACY VS SPEED TRADEOFF: - nprobe = 1: Fastest, lowest recall (~30-50%) - nprobe = sqrt(nlist): Good balance (~70-90% recall) - nprobe = nlist: Same as flat search (100% recall) CHOOSING NLIST: Rule of thumb: nlist = sqrt(n) or nlist = 4*sqrt(n) - For 1M vectors: nlist = 1,000 to 4,000 - For 100K vectors: nlist = 316 to 1,264 - For 10K vectors: nlist = 100 to 400 WHEN TO USE IVF: Use IVF when: 1. Dataset is large (>10K vectors) 2. You can tolerate ~90-95% recall (not 100%) 3. You want 10-100x speedup over flat search 4. Memory usage is not a primary concern DON'T use IVF when: 1. Dataset is small (<10K vectors) - use flat index 2. You need 100% recall - use flat index 3. Memory is very limited - use PQ or IVFPQ Package comet implements IVFPQ (Inverted File with Product Quantization). WHAT IS IVFPQ? IVFPQ combines IVF (scope reduction) with PQ (compression) to create one of the most powerful similarity search algorithms. It's the workhorse of large-scale vector search systems. RESIDUAL VECTORS IVFPQ encodes RESIDUALS (vector - centroid) instead of original vectors. This dramatically improves compression quality because: PERFORMANCE: TIME COMPLEXITY: Package comet implements a metadata filtering index for vector search. WHAT IS A METADATA INDEX? A metadata index enables fast filtering of documents based on structured metadata attributes before performing expensive vector similarity searches. This dramatically improves search performance by reducing the candidate set. HOW IT WORKS: The index uses two specialized data structures: 1. Roaring Bitmaps: For categorical fields (strings, booleans) 2. Bit-Sliced Index (BSI): For numeric fields (integers, floats) QUERY TYPES: - Equality: field = value - Inequality: field != value - Comparisons: field > value, field >= value, field < value, field <= value - Range: field BETWEEN min AND max - Set membership: field IN (val1, val2, val3) - Set exclusion: field NOT IN (val1, val2) - Existence: field EXISTS, field NOT EXISTS TIME COMPLEXITY: MEMORY REQUIREMENTS: - Roaring bitmaps: Highly compressed, typically 1-10% of uncompressed size - BSI: ~64 bits per numeric value (compressed with roaring) - Much more efficient than traditional B-tree indexes for high-cardinality data GUARANTEES & TRADE-OFFS: ā Pros: ā Cons: WHEN TO USE: Use metadata index when: 1. Pre-filtering documents before vector search 2. Need to filter by structured attributes (price, date, category, etc.) 3. Working with large datasets (100K+ documents) 4. Need sub-millisecond filter performance Package comet implements Product Quantization (PQ) for similarity search. WHAT IS PRODUCT QUANTIZATION? PQ is a lossy compression technique that dramatically reduces memory usage for vector storage while enabling approximate similarity search. It achieves compression ratios of 10-500x by dividing vectors into subspaces and quantizing each independently. THE CORE IDEA - DIVIDE AND COMPRESS: Instead of storing full high-dimensional vectors: 1. Divide each vector into M equal-sized subvectors (subspaces) 2. Learn a codebook of K centroids for each subspace via k-means 3. Encode each subvector with the ID of its nearest centroid 4. Store only these compact codes instead of original vectors COMPRESSION EXAMPLE: Original: 768 dims Ć 4 bytes = 3,072 bytes PQ (M=8, K=256): 8 subspaces Ć 1 byte = 8 bytes Compression: 384x smaller! TIME COMPLEXITY: WHEN TO USE PQ: - Dataset too large for RAM - Can tolerate 85-95% recall - L2 or inner product metric - Want massive compression
Package podfingerprint computes the fingerprint of a set of pods. "Pods" is meant in the kubernetes sense: https://kubernetes.io/docs/concepts/workloads/pods/ but for the purposes of this package, a Pod is identified by just its namespace + name pair. A "fingerprint" is a compact unique representation of this set of pods. Any given unordered set of pods with the same elements will yield the same fingerprint, regardless of the order on which the pods are enumerated. The fingerprint is not actually unique because it is implemented using a hash function, but the collisions are expected to be extremely low. Note this package will *NOT* restrict itself to use only cryptographically secure hash functions, so you should NOT use the fingerprint in security-sensitive contexts.
Package ja3 provides JA3 Client Fingerprinting for the Go language by looking at the TLS Client Hello packets. Basic Usage ja3 takes in TCP payload data as a []byte and computes the corresponding JA3 string and digest.
Package chunker implements Content Defined Chunking (CDC) based on a rolling Rabin Checksum. The function RandomPolynomial() returns a new random polynomial of degree 53 for use with the chunker. The degree 53 is chosen because it is the largest prime below 64-8 = 56, so that the top 8 bits of an uint64 can be used for optimising calculations in the chunker. A random polynomial is chosen selecting 64 random bits, masking away bits 64..54 and setting bit 53 to one (otherwise the polynomial is not of the desired degree) and bit 0 to one (otherwise the polynomial is trivially reducible), so that 51 bits are chosen at random. This process is repeated until Irreducible() returns true, then this polynomials is returned. If this doesn't happen after 1 million tries, the function returns an error. The probability for selecting an irreducible polynomial at random is about 7.5% ( (2^53-2)/53 / 2^51), so the probability that no irreducible polynomial has been found after 100 tries is lower than 0.04%. During development the results have been verified using the computational discrete algebra system GAP, which can be obtained from the website at http://www.gap-system.org/. For filtering a given list of polynomials in hexadecimal coefficient notation, the following script can be used: All irreducible polynomials from the list are written to the output. An introduction to Rabin Fingerprints/Checksums can be found in the following articles: Michael O. Rabin (1981): "Fingerprinting by Random Polynomials" http://www.xmailserver.org/rabin.pdf Ross N. Williams (1993): "A Painless Guide to CRC Error Detection Algorithms" http://www.zlib.net/crc_v3.txt Andrei Z. Broder (1993): "Some Applications of Rabin's Fingerprinting Method" http://www.xmailserver.org/rabin_apps.pdf Shuhong Gao and Daniel Panario (1997): "Tests and Constructions of Irreducible Polynomials over Finite Fields" http://www.math.clemson.edu/~sgao/papers/GP97a.pdf Andrew Kadatch, Bob Jenkins (2007): "Everything we know about CRC but afraid to forget" http://crcutil.googlecode.com/files/crc-doc.1.0.pdf
Package fingerprint provides methods for working with SVG files produced by the aaronland/fingerprint tool.
Package meekserver is the server transport plugin for the meek pluggable transport. It acts as an HTTP server, keeps track of session ids, and forwards received data to a local OR port. Sample usage in torrc: Using your own TLS certificate: Plain HTTP usage: The server runs in HTTPS mode by default, getting certificates from Let's Encrypt automatically. The server opens an auxiliary ACME listener on port 80 in order for the automatic certificates to work. If you have your own certificate, use the --cert and --key options. Use --disable-tls option to run with plain HTTP. Package meekserver provides an implementation of the Meek circumvention protocol. Only a client implementation is provided, and no effort is made to normalize the TLS fingerprint. It borrows quite liberally from the real meek-client code.
Package synapse is a wrapper library for the Synapse API (https://docs.synapsefi.com) Instantiate client Enable logging & turn off developer mode (developer mode is true by default) Register Fingerprint Set an `IDEMPOTENCY_KEY` (for `POST` requests only) Submit optional query parameters
simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. simhash fingerprints have the property that similar documents will have a similar fingerprint. Therefore, the hamming distance between two fingerprints will be small if the documents are similar for standalone test, change package to `main` and the next func def to, func main() {
Package virgil is the pure Go implementation of Virgil Security compatible SDK Right now it supports only ed25519 keys and signatures and curve25519 key exchange As for symmetric crypto, it's AES256-GCM Hashes used are SHA-384 for signature and SHA-256 for fingerprints
Package meeklite provides an implementation of the Meek circumvention protocol. Only a client implementation is provided, and no effort is made to normalize the TLS fingerprint. It borrows quite liberally from the real meek-client code.