Data Structures

github.com/Shugyousha/mafsa

Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) in a space-optimized way as described by Dacuik, Mihov, Watson, and Watson in their paper, "Incremental Construction of Minimal Acyclic Finite-State Automata" (2000). It also implements Minimal Perfect Hashing (MPH) as described by Lucceshi and Kowaltowski in their paper, "Applications of Finite Automata Representing Large Vocabularies" (1992). Unscientifically speaking, this package lets you store large amounts of strings (with Unicode) in memory so that membership queries, prefix lookups, and fuzzy searches are fast. And because minimal perfect hashing is included, you can associate each entry in the tree with more data used by your application. See the README or the end of this documentation for a brief tutorial. MA-FSA structures are a specific type of Deterministic Acyclic Finite State Automaton (DAFSA) which fold equivalent state transitions into each other starting from the suffix of each entry. Typical construction algorithms involve building out the entire tree first, then minimizing the completed tree. However, the method described in the paper above allows the tree to be minimized after every word insertion, provided the insertions are performed in lexicographical order, which drastically reduces memory usage compared to regular prefix trees ("tries"). The goal of this package is to provide a simple, useful, and correct implementation of MA-FSA. Though more complex algorithms exist for removal of items and unordered insertion, these features may be outside the scope of this package. Membership queries are on the order of O(n), where n is the length of the input string, so basically O(1). It is advisable to keep n small since long entries without much in common, especially in the beginning or end of the string, will quickly overrun the optimizations that are available. In those cases, n-gram implementations might be preferable, though these will use more CPU. This package provides two kinds of MA-FSA implementations. One, the BuildTree, facilitates the construction of an optimized tree and allows ordered insertions. The other, MinTree, is effectively read-only but uses significantly less memory and is ideal for production environments where only reads will be occurring. Usually your build process will be separate from your production application, which will make heavy use of reading the structure. To use this package, create a BuildTree and insert your items in lexicographical order: The tree is now compressed to a minimum number of nodes and is ready to be saved. In your production application, then, you can read the file into a MinTree directly: The mt variable is a *MinTree which has the same data as the original BuildTree, but without all the extra "scaffolding" that was required for adding new elements. The package provides some basic read mechanisms.

v0.2.0 • 4 years ago

github.com/kortschak/utter

Package utter implements a deep pretty printer for Go data structures to aid data snapshotting. A quick overview of the additional features utter provides over the built-in printing facilities for Go data types are as follows: The approach utter allows for dumping Go data structures is less flexible than its parent tool. It has just a: This section demonstrates how to quickly get started with utter. See the sections below for further details on formatting and configuration options. To dump a variable with full newlines, indentation, type, and pointer information use Dump, Fdump, or Sdump: Configuration of utter is handled by fields in the ConfigState type. For convenience, all of the top-level functions use a global state available via the utter.Config global. It is also possible to create a ConfigState instance that provides methods equivalent to the top-level functions. This allows concurrent configuration options. See the ConfigState documentation for more details. The following configuration options are available: Indent String to use for each indentation level for Dump functions. It is a single space by default. A popular alternative is "\t". NumericWidth NumericWidth specifies the number of columns to use when dumping a numeric slice or array (including bool). Zero specifies all entries on one line. StringWidth StringWidth specifies the number of columns to use when dumping a string slice or array. Zero specifies all entries on one line. BytesWidth Number of byte columns to use when dumping byte slices and arrays. CommentBytes Specifies whether ASCII comment annotations are attached to byte slice and array dumps. CommentPointers CommentPointers specifies whether pointer information will be added as comments. IgnoreUnexported Specifies that unexported fields should be ignored. ElideType ElideType specifies that type information defined by context should not be printed in a dump. SortKeys Specifies map keys should be sorted before being printed. Use this to have a more deterministic, diffable output. Note that only native types (bool, int, uint, floats, uintptr and string) are supported with other types sorted according to the reflect.Value.String() output which guarantees display stability. Natural map order is used by default. Simply call utter.Dump with a list of variables you want to dump: You may also call utter.Fdump if you would prefer to output to an arbitrary io.Writer. For example, to dump to standard error: A third option is to call utter.Sdump to get the formatted output as a string: See the Dump example for details on the setup of the types and variables being shown here. Byte (and uint8) arrays and slices are displayed uniquely, similar to the hexdump -C command as shown.

v1.7.0 • 7 months ago

github.com/filecoin-project/go-amt-ipld/v3

Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (3^2)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.

v3.1.1 • 3 years ago

github.com/filecoin-project/go-amt-ipld/v4

Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (2^3)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.

v4.4.0 • 4 months ago

github.com/square/squalor

Package squalor provides SQL utility routines, such as validation of models against table schemas, marshalling and unmarshalling of model structs into table rows and programmatic construction of SQL statements. While squalor uses the database/sql package, the SQL it utilizes is MySQL specific (e.g. REPLACE, INSERT ON DUPLICATE KEY UPDATE, etc). Given a simple table definition: And a model for a row of the table: The BindModel method will validate that the model and the table definition are compatible. For example, it would be an error for the User.ID field to have the type string. The table definition is loaded from the database allowing custom checks such as verifying the existence of indexes. While fields are often primitive types such as int64 or string, it is sometimes desirable to use a custom type. This can be accomplished by having the type implement the sql.Scanner and driver.Valuer interfaces. For example, you might create a GzippedText type which automatically compresses the value when it is written to the database and uncompressed when it is read: The Insert method is used to insert one or more rows in a table. You can pass either a struct or a pointer to a struct. Multiple rows can be inserted at once. Doing so offers convenience and performance. When multiple rows are batch inserted the returned error corresponds to the first SQL error encountered which may make it impossible to determine which row caused the error. If the rows correspond to different tables the order of insertion into the tables is undefined. If you care about the order of insertion into multiple tables and determining which row is causing an insertion error, structure your calls to Insert appropriately (i.e. insert into a single table or insert a single row). After a successful insert on a table with an auto increment primary key, the auto increment will be set back in the corresponding field of the object. For example, if the user table was defined as: Then we could create a new user by doing: The Replace method replaces a row in table, either inserting the row if it doesn't exist, or deleting and then inserting the row. See the MySQL docs for the difference between REPLACE, UPDATE and INSERT ON DUPLICATE KEY UPDATE (Upsert). The Update method updates a row in a table, returning the number of rows modified. The Upsert method inserts or updates a row. The Get method retrieves a single row by primary key and binds the result columns to a struct. The delete method deletes rows by primary key. See the documentation for DB.Delete for performance limitations when batch deleting multiple rows from a table with a primary key composed of multiple columns. To support optimistic locking with a column storing the version number, one field in a model object can be marked to serve as the lock. Modifying the example above: Now, the Update method will ensure that the object has not been concurrently modified when writing, by constraining the update by the version number. If the update is successful, the version number will be both incremented on the model (in-memory), as well as in the database. Programmatic construction of SQL queries prohibits SQL injection attacks while keeping query construction both readable and similar to SQL itself. Support is provided for most of the SQL DML (data manipulation language), though the constructed queries are targetted to MySQL. DB provides wrappers for the sql.Query and sql.QueryRow interfaces. The Rows struct returned from Query has an additional StructScan method for binding the columns for a row result to a struct. QueryRow can be used to easily query and scan a single value: In addition to the convenience of inserting, deleting and updating table rows using a struct, attention has been paid to performance. Since squalor is carefully constructing the SQL queries for insertion, deletion and update, it eschews the use of placeholders in favor of properly escaping values in the query. With the Go MySQL drivers, this saves a roundtrip to the database for each query because queries with placeholders must be prepared before being executed. Marshalling and unmarshalling of data from structs utilizes reflection, but care is taken to minimize the use of reflection in order to improve performance. For example, reflection data for models is cached when the model is bound to a table. Batch operations are utilized when possible. The Delete, Insert, Replace, Update and Upsert operations will perform multiple operations per SQL statement. This can provide an order of magnitude speed improvement over performing the mutations one row at a time due to minimizing the network overhead.

v0.0.1-0.20231103192214-2dbff314d1dc • last year

github.com/go-json-experiment/json

Package json implements semantic processing of JSON as specified in RFC 8259. JSON is a simple data interchange format that can represent primitive data types such as booleans, strings, and numbers, in addition to structured data types such as objects and arrays. Marshal and Unmarshal encode and decode Go values to/from JSON text contained within a []byte. MarshalWrite and UnmarshalRead operate on JSON text by writing to or reading from an io.Writer or io.Reader. MarshalEncode and UnmarshalDecode operate on JSON text by encoding to or decoding from a jsontext.Encoder or jsontext.Decoder. Options may be passed to each of the marshal or unmarshal functions to configure the semantic behavior of marshaling and unmarshaling (i.e., alter how JSON data is understood as Go data and vice versa). jsontext.Options may also be passed to the marshal or unmarshal functions to configure the syntactic behavior of encoding or decoding. The data types of JSON are mapped to/from the data types of Go based on the closest logical equivalent between the two type systems. For example, a JSON boolean corresponds with a Go bool, a JSON string corresponds with a Go string, a JSON number corresponds with a Go int, uint or float, a JSON array corresponds with a Go slice or array, and a JSON object corresponds with a Go struct or map. See the documentation on Marshal and Unmarshal for a comprehensive list of how the JSON and Go type systems correspond. Arbitrary Go types can customize their JSON representation by implementing MarshalerV1, MarshalerV2, UnmarshalerV1, or UnmarshalerV2. This provides authors of Go types with control over how their types are serialized as JSON. Alternatively, users can implement functions that match MarshalFuncV1, MarshalFuncV2, UnmarshalFuncV1, or UnmarshalFuncV2 to specify the JSON representation for arbitrary types. This provides callers of JSON functionality with control over how any arbitrary type is serialized as JSON. A Go struct is naturally represented as a JSON object, where each Go struct field corresponds with a JSON object member. When marshaling, all Go struct fields are recursively encoded in depth-first order as JSON object members except those that are ignored or omitted. When unmarshaling, JSON object members are recursively decoded into the corresponding Go struct fields. Object members that do not match any struct fields, also known as “unknown members”, are ignored by default or rejected if RejectUnknownMembers is specified. The representation of each struct field can be customized in the "json" struct field tag, where the tag is a comma separated list of options. As a special case, if the entire tag is `json:"-"`, then the field is ignored with regard to its JSON representation. The first option is the JSON object name override for the Go struct field. If the name is not specified, then the Go struct field name is used as the JSON object name. JSON names containing commas or quotes, or names identical to "" or "-", can be specified using a single-quoted string literal, where the syntax is identical to the Go grammar for a double-quoted string literal, but instead uses single quotes as the delimiters. By default, unmarshaling uses case-sensitive matching to identify the Go struct field associated with a JSON object name. After the name, the following tag options are supported: omitzero: When marshaling, the "omitzero" option specifies that the struct field should be omitted if the field value is zero as determined by the "IsZero() bool" method if present, otherwise based on whether the field is the zero Go value. This option has no effect when unmarshaling. omitempty: When marshaling, the "omitempty" option specifies that the struct field should be omitted if the field value would have been encoded as a JSON null, empty string, empty object, or empty array. This option has no effect when unmarshaling. string: The "string" option specifies that StringifyNumbers be set when marshaling or unmarshaling a struct field value. This causes numeric types to be encoded as a JSON number within a JSON string, and to be decoded from either a JSON number or a JSON string containing a JSON number. This extra level of encoding is often necessary since many JSON parsers cannot precisely represent 64-bit integers. nocase: When unmarshaling, the "nocase" option specifies that if the JSON object name does not exactly match the JSON name for any of the struct fields, then it attempts to match the struct field using a case-insensitive match that also ignores dashes and underscores. If multiple fields match, the first declared field in breadth-first order takes precedence. This takes precedence even if MatchCaseInsensitiveNames is set to false. This cannot be specified together with the "strictcase" option. strictcase: When unmarshaling, the "strictcase" option specifies that the JSON object name must exactly match the JSON name for the struct field. This takes precedence even if MatchCaseInsensitiveNames is set to true. This cannot be specified together with the "nocase" option. inline: The "inline" option specifies that the JSON representable content of this field type is to be promoted as if they were specified in the parent struct. It is the JSON equivalent of Go struct embedding. A Go embedded field is implicitly inlined unless an explicit JSON name is specified. The inlined field must be a Go struct (that does not implement any JSON methods), jsontext.Value, map[string]T, or an unnamed pointer to such types. When marshaling, inlined fields from a pointer type are omitted if it is nil. Inlined fields of type jsontext.Value and map[string]T are called “inlined fallbacks” as they can represent all possible JSON object members not directly handled by the parent struct. Only one inlined fallback field may be specified in a struct, while many non-fallback fields may be specified. This option must not be specified with any other option (including the JSON name). unknown: The "unknown" option is a specialized variant of the inlined fallback to indicate that this Go struct field contains any number of unknown JSON object members. The field type must be a jsontext.Value, map[string]T, or an unnamed pointer to such types. If DiscardUnknownMembers is specified when marshaling, the contents of this field are ignored. If RejectUnknownMembers is specified when unmarshaling, any unknown object members are rejected regardless of whether an inlined fallback with the "unknown" option exists. This option must not be specified with any other option (including the JSON name). format: The "format" option specifies a format flag used to specialize the formatting of the field value. The option is a key-value pair specified as "format:value" where the value must be either a literal consisting of letters and numbers (e.g., "format:RFC3339") or a single-quoted string literal (e.g., "format:'2006-01-02'"). The interpretation of the format flag is determined by the struct field type. The "omitzero" and "omitempty" options are mostly semantically identical. The former is defined in terms of the Go type system, while the latter in terms of the JSON type system. Consequently they behave differently in some circumstances. For example, only a nil slice or map is omitted under "omitzero", while an empty slice or map is omitted under "omitempty" regardless of nilness. The "omitzero" option is useful for types with a well-defined zero value (e.g., net/netip.Addr) or have an IsZero method (e.g., time.Time.IsZero). Every Go struct corresponds to a list of JSON representable fields which is constructed by performing a breadth-first search over all struct fields (excluding unexported or ignored fields), where the search recursively descends into inlined structs. The set of non-inlined fields in a struct must have unique JSON names. If multiple fields all have the same JSON name, then the one at shallowest depth takes precedence and the other fields at deeper depths are excluded from the list of JSON representable fields. If multiple fields at the shallowest depth have the same JSON name, but exactly one is explicitly tagged with a JSON name, then that field takes precedence and all others are excluded from the list. This is analogous to Go visibility rules for struct field selection with embedded struct types. Marshaling or unmarshaling a non-empty struct without any JSON representable fields results in a SemanticError. Unexported fields must not have any `json` tags except for `json:"-"`. Unmarshal matches JSON object names with Go struct fields using a case-sensitive match, but can be configured to use a case-insensitive match with the "nocase" option. This permits unmarshaling from inputs that use naming conventions such as camelCase, snake_case, or kebab-case. By default, JSON object names for Go struct fields are derived from the Go field name, but may be specified in the `json` tag. Due to JSON's heritage in JavaScript, the most common naming convention used for JSON object names is camelCase. The "format" tag option can be used to alter the formatting of certain types. JSON objects can be inlined within a parent object similar to how Go structs can be embedded within a parent struct. The inlining rules are similar to those of Go embedding, but operates upon the JSON namespace. Go struct fields can be omitted from the output depending on either the input Go value or the output JSON encoding of the value. The "omitzero" option omits a field if it is the zero Go value or implements a "IsZero() bool" method that reports true. The "omitempty" option omits a field if it encodes as an empty JSON value, which we define as a JSON null or empty JSON string, object, or array. In many cases, the behavior of "omitzero" and "omitempty" are equivalent. If both provide the desired effect, then using "omitzero" is preferred. The exact order of JSON object can be preserved through the use of a specialized type that implements MarshalerV2 and UnmarshalerV2. Some Go types have a custom JSON representation where the implementation is delegated to some external package. Consequently, the "json" package will not know how to use that external implementation. For example, the google.golang.org/protobuf/encoding/protojson package implements JSON for all google.golang.org/protobuf/proto.Message types. WithMarshalers and WithUnmarshalers can be used to configure "json" and "protojson" to cooperate together. When implementing HTTP endpoints, it is common to be operating with an io.Reader and an io.Writer. The MarshalWrite and UnmarshalRead functions assist in operating on such input/output types. UnmarshalRead reads the entirety of the io.Reader to ensure that io.EOF is encountered without any unexpected bytes after the top-level JSON value. If a type implements encoding.TextMarshaler and/or encoding.TextUnmarshaler, then the MarshalText and UnmarshalText methods are used to encode/decode the value to/from a JSON string. Due to version skew, the set of JSON object members known at compile-time may differ from the set of members encountered at execution-time. As such, it may be useful to have finer grain handling of unknown members. This package supports preserving, rejecting, or discarding such members.

v0.0.0-20231102232822-2e55bd4e08b0 • last year

github.com/scylladb/gocql

Package gocql implements a fast and robust Cassandra driver for the Go programming language. Pass a list of initial node IP addresses to NewCluster to create a new cluster configuration: Port can be specified as part of the address, the above is equivalent to: It is recommended to use the value set in the Cassandra config for broadcast_address or listen_address, an IP address not a domain name. This is because events from Cassandra will use the configured IP address, which is used to index connected hosts. If the domain name specified resolves to more than 1 IP address then the driver may connect multiple times to the same host, and will not mark the node being down or up from events. Then you can customize more options (see ClusterConfig): The driver tries to automatically detect the protocol version to use if not set, but you might want to set the protocol version explicitly, as it's not defined which version will be used in certain situations (for example during upgrade of the cluster when some of the nodes support different set of protocol versions than other nodes). The driver advertises the module name and version in the STARTUP message, so servers are able to detect the version. If you use replace directive in go.mod, the driver will send information about the replacement module instead. When ready, create a session from the configuration. Don't forget to Close the session once you are done with it: CQL protocol uses a SASL-based authentication mechanism and so consists of an exchange of server challenges and client response pairs. The details of the exchanged messages depend on the authenticator used. To use authentication, set ClusterConfig.Authenticator or ClusterConfig.AuthProvider. PasswordAuthenticator is provided to use for username/password authentication: It is possible to secure traffic between the client and server with TLS. To use TLS, set the ClusterConfig.SslOpts field. SslOptions embeds *tls.Config so you can set that directly. There are also helpers to load keys/certificates from files. Warning: Due to historical reasons, the SslOptions is insecure by default, so you need to set EnableHostVerification to true if no Config is set. Most users should set SslOptions.Config to a *tls.Config. SslOptions and Config.InsecureSkipVerify interact as follows: For example: To route queries to local DC first, use DCAwareRoundRobinPolicy. For example, if the datacenter you want to primarily connect is called dc1 (as configured in the database): The driver can route queries to nodes that hold data replicas based on partition key (preferring local DC). Note that TokenAwareHostPolicy can take options such as gocql.ShuffleReplicas and gocql.NonLocalReplicasFallback. We recommend running with a token aware host policy in production for maximum performance. The driver can only use token-aware routing for queries where all partition key columns are query parameters. For example, instead of use The DCAwareRoundRobinPolicy can be replaced with RackAwareRoundRobinPolicy, which takes two parameters, datacenter and rack. Instead of dividing hosts with two tiers (local datacenter and remote datacenters) it divides hosts into three (the local rack, the rest of the local datacenter, and everything else). RackAwareRoundRobinPolicy can be combined with TokenAwareHostPolicy in the same way as DCAwareRoundRobinPolicy. Create queries with Session.Query. Query values must not be reused between different executions and must not be modified after starting execution of the query. To execute a query without reading results, use Query.Exec: Single row can be read by calling Query.Scan: Multiple rows can be read using Iter.Scanner: See Example for complete example. The driver automatically prepares DML queries (SELECT/INSERT/UPDATE/DELETE/BATCH statements) and maintains a cache of prepared statements. CQL protocol does not support preparing other query types. When using CQL protocol >= 4, it is possible to use gocql.UnsetValue as the bound value of a column. This will cause the database to ignore writing the column. The main advantage is the ability to keep the same prepared statement even when you don't want to update some fields, where before you needed to make another prepared statement. Session is safe to use from multiple goroutines, so to execute multiple concurrent queries, just execute them from several worker goroutines. Gocql provides synchronously-looking API (as recommended for Go APIs) and the queries are executed asynchronously at the protocol level. Null values are are unmarshalled as zero value of the type. If you need to distinguish for example between text column being null and empty string, you can unmarshal into *string variable instead of string. See Example_nulls for full example. The driver reuses backing memory of slices when unmarshalling. This is an optimization so that a buffer does not need to be allocated for every processed row. However, you need to be careful when storing the slices to other memory structures. When you want to save the data for later use, pass a new slice every time. A common pattern is to declare the slice variable within the scanner loop: The driver supports paging of results with automatic prefetch, see ClusterConfig.PageSize, Session.SetPrefetch, Query.PageSize, and Query.Prefetch. It is also possible to control the paging manually with Query.PageState (this disables automatic prefetch). Manual paging is useful if you want to store the page state externally, for example in a URL to allow users browse pages in a result. You might want to sign/encrypt the paging state when exposing it externally since it contains data from primary keys. Paging state is specific to the CQL protocol version and the exact query used. It is meant as opaque state that should not be modified. If you send paging state from different query or protocol version, then the behaviour is not defined (you might get unexpected results or an error from the server). For example, do not send paging state returned by node using protocol version 3 to a node using protocol version 4. Also, when using protocol version 4, paging state between Cassandra 2.2 and 3.0 is incompatible (https://issues.apache.org/jira/browse/CASSANDRA-10880). The driver does not check whether the paging state is from the same protocol version/statement. You might want to validate yourself as this could be a problem if you store paging state externally. For example, if you store paging state in a URL, the URLs might become broken when you upgrade your cluster. Call Query.PageState(nil) to fetch just the first page of the query results. Pass the page state returned by Iter.PageState to Query.PageState of a subsequent query to get the next page. If the length of slice returned by Iter.PageState is zero, there are no more pages available (or an error occurred). Using too low values of PageSize will negatively affect performance, a value below 100 is probably too low. While Cassandra returns exactly PageSize items (except for last page) in a page currently, the protocol authors explicitly reserved the right to return smaller or larger amount of items in a page for performance reasons, so don't rely on the page having the exact count of items. See Example_paging for an example of manual paging. There are certain situations when you don't know the list of columns in advance, mainly when the query is supplied by the user. Iter.Columns, Iter.RowData, Iter.MapScan and Iter.SliceMap can be used to handle this case. See Example_dynamicColumns. The CQL protocol supports sending batches of DML statements (INSERT/UPDATE/DELETE) and so does gocql. Use Session.NewBatch to create a new batch and then fill-in details of individual queries. Then execute the batch with Session.ExecuteBatch. Logged batches ensure atomicity, either all or none of the operations in the batch will succeed, but they have overhead to ensure this property. Unlogged batches don't have the overhead of logged batches, but don't guarantee atomicity. Updates of counters are handled specially by Cassandra so batches of counter updates have to use CounterBatch type. A counter batch can only contain statements to update counters. For unlogged batches it is recommended to send only single-partition batches (i.e. all statements in the batch should involve only a single partition). Multi-partition batch needs to be split by the coordinator node and re-sent to correct nodes. With single-partition batches you can send the batch directly to the node for the partition without incurring the additional network hop. It is also possible to pass entire BEGIN BATCH .. APPLY BATCH statement to Query.Exec. There are differences how those are executed. BEGIN BATCH statement passed to Query.Exec is prepared as a whole in a single statement. Session.ExecuteBatch prepares individual statements in the batch. If you have variable-length batches using the same statement, using Session.ExecuteBatch is more efficient. See Example_batch for an example. Query.ScanCAS or Query.MapScanCAS can be used to execute a single-statement lightweight transaction (an INSERT/UPDATE .. IF statement) and reading its result. See example for Query.MapScanCAS. Multiple-statement lightweight transactions can be executed as a logged batch that contains at least one conditional statement. All the conditions must return true for the batch to be applied. You can use Session.ExecuteBatchCAS and Session.MapExecuteBatchCAS when executing the batch to learn about the result of the LWT. See example for Session.MapExecuteBatchCAS. Queries can be marked as idempotent. Marking the query as idempotent tells the driver that the query can be executed multiple times without affecting its result. Non-idempotent queries are not eligible for retrying nor speculative execution. Idempotent queries are retried in case of errors based on the configured RetryPolicy. If the query is LWT and the configured RetryPolicy additionally implements LWTRetryPolicy interface, then the policy will be cast to LWTRetryPolicy and used this way. Queries can be retried even before they fail by setting a SpeculativeExecutionPolicy. The policy can cause the driver to retry on a different node if the query is taking longer than a specified delay even before the driver receives an error or timeout from the server. When a query is speculatively executed, the original execution is still executing. The two parallel executions of the query race to return a result, the first received result will be returned. UDTs can be mapped (un)marshaled from/to map[string]interface{} a Go struct (or a type implementing UDTUnmarshaler, UDTMarshaler, Unmarshaler or Marshaler interfaces). For structs, cql tag can be used to specify the CQL field name to be mapped to a struct field: See Example_userDefinedTypesMap, Example_userDefinedTypesStruct, ExampleUDTMarshaler, ExampleUDTUnmarshaler. It is possible to provide observer implementations that could be used to gather metrics: CQL protocol also supports tracing of queries. When enabled, the database will write information about internal events that happened during execution of the query. You can use Query.Trace to request tracing and receive the session ID that the database used to store the trace information in system_traces.sessions and system_traces.events tables. NewTraceWriter returns an implementation of Tracer that writes the events to a writer. Gathering trace information might be essential for debugging and optimizing queries, but writing traces has overhead, so this feature should not be used on production systems with very high load unless you know what you are doing. Example_batch demonstrates how to execute a batch of statements. Example_dynamicColumns demonstrates how to handle dynamic column list. Example_marshalerUnmarshaler demonstrates how to implement a Marshaler and Unmarshaler. Example_nulls demonstrates how to distinguish between null and zero value when needed. Null values are unmarshalled as zero value of the type. If you need to distinguish for example between text column being null and empty string, you can unmarshal into *string field. Example_paging demonstrates how to manually fetch pages and use page state. See also package documentation about paging. Example_set demonstrates how to use sets. Example_userDefinedTypesMap demonstrates how to work with user-defined types as maps. See also Example_userDefinedTypesStruct and examples for UDTMarshaler and UDTUnmarshaler if you want to map to structs. Example_userDefinedTypesStruct demonstrates how to work with user-defined types as structs. See also examples for UDTMarshaler and UDTUnmarshaler if you need more control/better performance.

v1.14.4 • 2 months ago

github.com/cybozu-go/log

github.com/heiyeluren/xmm

github.com/greatroar/blobloom

github.com/DavidBelicza/TextRank/v2

github.com/ef-ds/deque

github.com/Shugyousha/mafsa

github.com/mooijtech/btree/v2

github.com/hschendel/stl

github.com/kortschak/utter

github.com/filecoin-project/go-amt-ipld/v3

github.com/jordic/goics

github.com/filecoin-project/go-amt-ipld/v4

github.com/rveen/ogdl

github.com/steveyen/gkvlite

github.com/intel/tfortools

github.com/emer/etable

github.com/amejiarosario/dsa.js-data-structures-algorithms-javascript

github.com/Pyrinpyi/go-muhash

git.sr.ht/~whereswaldon/latest

github.com/glycerine/zebrapack

github.com/glycerine/greenpack

github.com/DavidBelicza/TextRank

git.sr.ht/~sbinet/go-arrow

gopkg.in/cyverse-de/messaging.v6

github.com/linxGnu/go-adder

github.com/RoaringBitmap/gocroaring

github.com/philandstuff/dhall-golang/v6

github.com/folbricht/desync

github.com/pingcap/errcode

github.com/jrick/bitset

github.com/timtadh/fs2

github.com/elastic/go-lookslike

github.com/square/squalor

github.com/aws/aws-sdk-go-v2/service/dataexchange

github.com/gdm85/go-rencode

github.com/aead/siphash

github.com/jamiebuilds/itsy-bitsy-data-structures

github.com/c3systems/merkletree

github.com/gojekfarm/xtools/xkafka

github.com/go-json-experiment/json

github.com/scylladb/gocql

github.com/jibaru/golang-data-structures

github.com/yusitong1999/data-structures

github.com/michelprogram/data-structures

bitbucket.org/sensohiveprojects/data-structures

github.com/Bradfield-CSI-2020/arun_csi_prep/1_algorithms_and_data_structures/src

github.com/allenxuxu/data-structures-and-algorithms

github.com/madongming/wheelmaking/data-structure/queue

github.com/ErikRHanson/Problem-Solving-with-Algorithms-and-Data-Structures-Using-Python

github.com/daniel-orlov/go-data-structures