Package log provides the standard logging framework for cybozu products. As this is a framework rather than a library, most features are hard-coded and non-customizable. cybozu/log is a structured logger, that is, every log entry consists of mandatory and optional fields. Mandatory fields are: To help development, logs go to standard error by default. This can be changed to any io.Writer. Logs are formatted by a Formatter. Following built-in formatters are available. The standard field names are defined as constants in this package. For example, "secret" is defined as FnSecret. Field data can be any type though following types are recommended: The framework automatically redirects Go's standard log output to the default logger provided by this framework.
Package xmm Package redblacktree provides a pure Golang implementation of a red-black tree as described by Thomas H. Cormen's et al. in their seminal Algorithms book (3rd ed). This data structure is not multi-goroutine safe.
Package blobloom implements blocked Bloom filters. Blocked Bloom filters are an approximate set data structure: if a key has been added to a filter, a lookup of that key returns true, but if the key has not been added, there is a non-zero probability that the lookup still returns true (a false positive). False negatives are impossible: if the lookup for a key returns false, that key has not been added. In this package, keys are represented exclusively as hashes. Client code is responsible for supplying a 64-bit hash value. Compared to standard Bloom filters, blocked Bloom filters use the CPU cache more efficiently. A blocked Bloom filter is an array of ordinary Bloom filters of fixed size BlockBits (the blocks). The lower half of the hash selects the block to use. To achieve the same false positive rate (FPR) as a standard Bloom filter, a blocked Bloom filter requires more memory. For an FPR of at most 2e-6 (two in a million), it uses ~20% more memory. At 1e-10, the space required is double that of standard Bloom filter. For more details, see the 2010 paper by Putze, Sanders and Singler, https://algo2.iti.kit.edu/documents/cacheefficientbloomfilters-jea.pdf.
Package textrank is an implementation of Text Rank algorithm in Go with extendable features (automatic summarization, phrase extraction). It supports multithreading by goroutines. The package is under The MIT Licence. If there was a program what could rank book size text's words, phrases and sentences continuously on multiple threads and it would be opened to modifing by objects, written in a simple, secure, static language and if it would be very well documented... Now, here it is. - Find the most important phrases. - Find the most important words. - Find the most important N sentences. - Importance by phrase weights. - Importance by word occurrence. - Find the first N sentences, start from Xth sentence. - Find sentences by phrase chains ordered by position in text. - Access to the whole ranked data. - Support more languages. - Algorithm for weighting can be modified by interface implementation. - Parser can be modified by interface implementation. - Multi thread support. Find the most important phrases: This is the most basic and simplest usage of textrank. All possible pre-defined finder queries: After ranking, the graph contains a lot of valuable data. There are functions in textrank package what contains logic to retrieve those data from the graph. After ranking, the graph contains a lot of valuable data. The GetRank function allows access to the graph and every data can be retrieved from this structure. Adding text continuously: It is possibe to add more text after another texts already have been added. The Ranking function can merge these multiple texts and it can recalculate the weights and all related data. Using different algorithm to ranking text: There are two algorithm has implemented, it is possible to write custom algorithm by Algorithm interface and use it instead of defaults. Using multiple graphs: Graph ID exists because it is possible run multiple independent text ranking processes. Using different non-English languages: Engish is used by default but it is possible to add any language. To use other languages a stop word list is required what you can find here: https://github.com/stopwords-iso Asynchronous usage by goroutines: It is thread safe. Independent graphs can receive texts in the same time and can be extended by more text also in the same time.
Package deque implements a very fast and efficient general purpose queue/stack/deque data structure that is specifically optimized to perform when used by Microservices and serverless services running in production environments.
Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) in a space-optimized way as described by Dacuik, Mihov, Watson, and Watson in their paper, "Incremental Construction of Minimal Acyclic Finite-State Automata" (2000). It also implements Minimal Perfect Hashing (MPH) as described by Lucceshi and Kowaltowski in their paper, "Applications of Finite Automata Representing Large Vocabularies" (1992). Unscientifically speaking, this package lets you store large amounts of strings (with Unicode) in memory so that membership queries, prefix lookups, and fuzzy searches are fast. And because minimal perfect hashing is included, you can associate each entry in the tree with more data used by your application. See the README or the end of this documentation for a brief tutorial. MA-FSA structures are a specific type of Deterministic Acyclic Finite State Automaton (DAFSA) which fold equivalent state transitions into each other starting from the suffix of each entry. Typical construction algorithms involve building out the entire tree first, then minimizing the completed tree. However, the method described in the paper above allows the tree to be minimized after every word insertion, provided the insertions are performed in lexicographical order, which drastically reduces memory usage compared to regular prefix trees ("tries"). The goal of this package is to provide a simple, useful, and correct implementation of MA-FSA. Though more complex algorithms exist for removal of items and unordered insertion, these features may be outside the scope of this package. Membership queries are on the order of O(n), where n is the length of the input string, so basically O(1). It is advisable to keep n small since long entries without much in common, especially in the beginning or end of the string, will quickly overrun the optimizations that are available. In those cases, n-gram implementations might be preferable, though these will use more CPU. This package provides two kinds of MA-FSA implementations. One, the BuildTree, facilitates the construction of an optimized tree and allows ordered insertions. The other, MinTree, is effectively read-only but uses significantly less memory and is ideal for production environments where only reads will be occurring. Usually your build process will be separate from your production application, which will make heavy use of reading the structure. To use this package, create a BuildTree and insert your items in lexicographical order: The tree is now compressed to a minimum number of nodes and is ready to be saved. In your production application, then, you can read the file into a MinTree directly: The mt variable is a *MinTree which has the same data as the original BuildTree, but without all the extra "scaffolding" that was required for adding new elements. The package provides some basic read mechanisms.
Package btree implements in-memory B-Trees of arbitrary degree. btree implements an in-memory B-Tree for use as an ordered data structure. It is not meant for persistent storage solutions. It has a flatter structure than an equivalent red-black or other binary tree, which in some cases yields better memory usage and/or performance. See some discussion on the matter here: Note, though, that this project is in no way related to the C++ B-Tree implementation written about there. Within this tree, each node contains a slice of items and a (possibly nil) slice of children. For basic numeric values or raw structs, this can cause efficiency differences when compared to equivalent C++ template code that stores values in arrays within the node: These issues don't tend to matter, though, when working with strings or other heap-allocated structures, since C++-equivalent structures also must store pointers and also distribute their values across the heap. This implementation is designed to be a drop-in replacement to gollrb.LLRB trees, (http://github.com/petar/gollrb), an excellent and probably the most widely used ordered tree implementation in the Go ecosystem currently. Its functions, therefore, exactly mirror those of llrb.LLRB where possible. Unlike gollrb, though, we currently don't support storing multiple equivalent values.
Package stl implements functions to read, write, and transform files in the Stereolithography/Surface Tesselation Language (.stl) file format used in 3D modelling. The format specification was taken from http://www.ennex.com/~fabbers/StL.asp, found at http://en.wikipedia.org/wiki/STL_%28file_format%29. While STL stores the data in single precision 32 bit floating point numbers, the stl package does all calculations beyond simple addition in double precision 64 bit (float64). Usage Example Everything that operates on a model is defined as a method of Solid. Note that The STL format has two variants, a human-readable ASCII variant, and a more compact and precise binary variant which is preferrable. The Solid.BinaryHeader field and the Triangle.Attributes fields will be empty, after reading, as these are not part of the ASCII format. The Solid.Name field is read from the first line after "solid ". It is not checked against the name at the end of the file after "endsolid ". The stl package will also not cope with Unicode byte order marks, which some text editors might automatically place at the beginning of a file. The Solid.BinaryHeader field is filled with all 80 bytes of header data. Then, ReadFile will try to fill solid.Name with an ASCII string read from the header data from the first byte until a \0 or a non-ASCII character is detected. As always when you do linear transformations on floating point numbers, you get numerical errors. So you should expect a vertex being rotated for 360° not to end up at exactly the original coordinates, but instead just very close to them. As the error is usually far smaller than the available precision of 3D printing applications, this is not an issue in most cases. You can implement the Writer interface to directly write into your own data structures. This way you can use the CopyFile and CopyAll functions.
Package utter implements a deep pretty printer for Go data structures to aid data snapshotting. A quick overview of the additional features utter provides over the built-in printing facilities for Go data types are as follows: The approach utter allows for dumping Go data structures is less flexible than its parent tool. It has just a: This section demonstrates how to quickly get started with utter. See the sections below for further details on formatting and configuration options. To dump a variable with full newlines, indentation, type, and pointer information use Dump, Fdump, or Sdump: Configuration of utter is handled by fields in the ConfigState type. For convenience, all of the top-level functions use a global state available via the utter.Config global. It is also possible to create a ConfigState instance that provides methods equivalent to the top-level functions. This allows concurrent configuration options. See the ConfigState documentation for more details. The following configuration options are available: Indent String to use for each indentation level for Dump functions. It is a single space by default. A popular alternative is "\t". NumericWidth NumericWidth specifies the number of columns to use when dumping a numeric slice or array (including bool). Zero specifies all entries on one line. StringWidth StringWidth specifies the number of columns to use when dumping a string slice or array. Zero specifies all entries on one line. BytesWidth Number of byte columns to use when dumping byte slices and arrays. CommentBytes Specifies whether ASCII comment annotations are attached to byte slice and array dumps. CommentPointers CommentPointers specifies whether pointer information will be added as comments. IgnoreUnexported Specifies that unexported fields should be ignored. ElideType ElideType specifies that type information defined by context should not be printed in a dump. SortKeys Specifies map keys should be sorted before being printed. Use this to have a more deterministic, diffable output. Note that only native types (bool, int, uint, floats, uintptr and string) are supported with other types sorted according to the reflect.Value.String() output which guarantees display stability. Natural map order is used by default. Simply call utter.Dump with a list of variables you want to dump: You may also call utter.Fdump if you would prefer to output to an arbitrary io.Writer. For example, to dump to standard error: A third option is to call utter.Sdump to get the formatted output as a string: See the Dump example for details on the setup of the types and variables being shown here. Byte (and uint8) arrays and slices are displayed uniquely, similar to the hexdump -C command as shown.
Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (3^2)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.
Package goics is a toolkit for encoding and decoding ics/Ical/icalendar files. This is a work in progress project, that will try to incorporate as many exceptions and variants of the format. This is a toolkit because user has to define the way it needs the data. The idea is builded with something similar to the consumer/provider pattern. We want to decode a stream of vevents from a .ics file into a custom type Events Our custom type, will need to implement ICalConsumer interface, where, the type will pick up data from the format. The decoding process will be somthing like this: I have choosed this model, because, this format is a pain and also I don't like a lot the reflect package. For encoding objects to iCal format, something similar has to be done: The object emitting elements for the encoder, will have to implement the ICalEmiter, returning a Component structure to be encoded. This also had been done, because every package could require to encode vals and keys their way. Just for encoding time, I found more than three types of lines. The Componenter, is an interface that every Component that can be encoded to ical implements. Properties, had to be stored as strings, the conversion from origin type to string format, must be done, on the emmiter. There are some helpers for date conversion and on the future I will add more, for encoding params on the string, and also for handling lists and recurrent events. A simple example not functional used for testing:
Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (2^3)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.
Package ogdl is used to process OGDL, the Ordered Graph Data Language. OGDL is a textual format to write trees or graphs of text, where indentation and spaces define the structure. Here is an example: The languange is simple, either in its textual representation or its number of productions (the specification rules), allowing for compact implementations. OGDL character streams are normally formed by Unicode characters, and encoded as UTF-8 strings, but any encoding that is ASCII transparent is compatible with the specification. See the full spec at http://ogdl.org. To install this package just do: If we have a text file 'config.ogdl' containing: then, will print If the timeout parameter was not present, then the default value (60) will be assigned to 'to'. The default value is optional, but be aware that Int64() will return 0 in case that the parameter doesn't exist. The configuration file can be written in a conciser way: The package includes a template processor. It takes an arbitrary input stream with some variables in it, and produces an output stream with the variables resolved out of a Graph object which acts as context. For example (given the previous config file): string(b) is then: Some rules are followed:
Package gkvlite provides a simple, ordered, ACID, key-value persistence library. It provides persistent, immutable data structure abstrations.
Package tfortools provides a set of functions that are designed to make it easier for developers to add template based scripting to their command line tools. Command line tools written in Go often allow users to specify a template script to tailor the output of the tool to their specific needs. This can be useful both when visually inspecting the data and also when invoking command line tools in scripts. The best example of this is go list which allows users to pass a template script to extract interesting information about Go packages. For example, prints all the imports of the current package. The aim of this package is to make it easier for developers to add template scripting support to their tools and easier for users of these tools to extract the information they need. It does this by augmenting the templating language provided by the standard library package text/template in two ways: 1. It auto generates descriptions of the data structures passed as input to a template script for use in help messages. This ensures that help usage information is always up to date with the source code. 2. It provides a suite of convenience functions to make it easy for script writers to extract the data they need. There are functions for sorting, selecting rows and columns and generating nicely formatted tables. For example, if a program passed a slice of structs containing stock data to a template script, we could use the following script to extract the names of the 3 stocks with the highest trade volume. The output might look something like this: The functions head, sort, tables and col are provided by this package.
Package etable provides a DataTable structure (also known as a DataFrame) which is a collection of columnar data all having the same number of rows. Each column is an etensor.Tensor. The following sub-packages are included: * bitslice is a Go slice of bytes []byte that has methods for setting individual bits, as if it was a slice of bools, while being 8x more memory efficient. This is used for encoding null entries in etensor, and as a Tensor of bool / bits there as well, and is generally very useful for binary (boolean) data. * etensor is the emer implementation of a Tensor (n-dimensional array) object. etensor.Tensor is an interface that applies to many different type-specific instances, such as etensor.Float32. A tensor is just a etensor.Shape plus a slice holding the specific data type. Our tensor is based directly on the [Apache Arrow](https://github.com/apache/arrow/tree/master/go) project's tensor, and it fully interoperates with it. Arrow tensors are designed to be read-only, and we needed some extra support to make our etable.Table work well, so we had to roll our own. Our tensors also interoperate fully with Gonum's 2D-specific Matrix type for the 2D case, and can use the gonum/floats and stats routines for raw arithmetic etc. * etable is our Go version of DataTable from C++ emergent, which is widely useful for holding input patterns to present to the network, and logs of output from the network, among many other uses. A etable.Table is a collection of etensor.Tensor columns, that are all aligned along the outer-most *row* dimension. Index-based indirection is supported via optional args, but we do not take on the burden of ensuring full updating of the indexes across all operations, which greatly simplifies things. The etable.Table should interoperate with the under-development gonum DataFrame structure among others. The use of this data structure is always optional and orthogonal to the core network algorithm code -- in Python the pandas library has a suitable DataFrame structure that can be used instead.
Package muhash provides an implementation of a Multiplicative Hash, a cryptographic data structure that allows you to have a rolling hash function that you can add and remove elements from, without the need to re-serialize and re-hash the whole data set.
Package latest provides concurrent data structures that waste the least possible work on state data.
zebrapack is a code generation tool for creating methods to serialize and de-serialize Go data structures to and from ZebraPack (a schema-based serialization format that is derived from MessagePack2). This package is targeted at the `go generate` tool. To use it, include the following directive in a go source file with types requiring source generation: The go generate tool should set the proper environment variables for the generator to execute without any command-line flags. However, the following options are supported, if you need them (See zebrapack -h): For more information, please read README.md, and the wiki at github.com/glycerine/zebrapack
greenpack is a code generation tool for creating methods to serialize and de-serialize Go data structures to and from Greenpack (a schema-based serialization format that is derived from MessagePack2). This package is targeted at the `go generate` tool. To use it, include the following directive in a go source file with types requiring source generation: The go generate tool should set the proper environment variables for the generator to execute without any command-line flags. However, the following options are supported, if you need them (See greenpack -h): For more information, please read README.md, and the wiki at github.com/glycerine/greenpack
Package textrank is an implementation of Text Rank algorithm in Go with extendable features (automatic summarization, phrase extraction). It supports multithreading by goroutines. The package is under The MIT Licence. If there was a program what could rank book size text's words, phrases and sentences continuously on multiple threads and it would be opened to modifing by objects, written in a simple, secure, static language and if it would be very well documented... Now, here it is. - Find the most important phrases. - Find the most important words. - Find the most important N sentences. - Importance by phrase weights. - Importance by word occurrence. - Find the first N sentences, start from Xth sentence. - Find sentences by phrase chains ordered by position in text. - Access to the whole ranked data. - Support more languages. - Algorithm for weighting can be modified by interface implementation. - Parser can be modified by interface implementation. - Multi thread support. Find the most important phrases: This is the most basic and simplest usage of textrank. All possible pre-defined finder queries: After ranking, the graph contains a lot of valuable data. There are functions in textrank package what contains logic to retrieve those data from the graph. After ranking, the graph contains a lot of valuable data. The GetRank function allows access to the graph and every data can be retrieved from this structure. Adding text continuously: It is possibe to add more text after another texts already have been added. The Ranking function can merge these multiple texts and it can recalculate the weights and all related data. Using different algorithm to ranking text: There are two algorithm has implemented, it is possible to write custom algorithm by Algorithm interface and use it instead of defaults. Using multiple graphs: Graph ID exists because it is possible run multiple independent text ranking processes. Using different non-English languages: Engish is used by default but it is possible to add any language. To use other languages a stop word list is required what you can find here: https://github.com/stopwords-iso Asynchronous usage by goroutines: It is thread safe. Independent graphs can receive texts in the same time and can be extended by more text also in the same time.
Package arrow provides an implementation of Apache Arrow. Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and inter-process communication. The fundamental data structure in Arrow is an Array, which holds a sequence of values of the same type. An array consists of memory holding the data and an additional validity bitmap that indicates if the corresponding entry in the array is valid (not null). If the array has no null entries, it is possible to omit this bitmap. This example shows how to create a FixedSizeList array. The resulting array should be: This example shows how one can slice an array. The initial (float64) array is: and the sub-slice is: This example demonstrates creating an array, sourcing the values and null bitmaps directly from byte slices. The null count is set to UnknownNullCount, instructing the array to calculate the null count from the bitmap when NullN is called. This example shows how to create a List array. The resulting array should be: This example demonstrates how to build an array of int64 values using a builder and Append. Whilst convenient for small arrays, This example shows how to create a Struct array. The resulting array should be:
Package messaging provides the logic and data structures that the services will need to communicate with each other over AMQP (as implemented by RabbitMQ).
Package goadder contains a collection of thread-safe, concurrent data structures for reading and writing numeric int64-counter, inspired by OpenJDK9 LongAdder. Beside JDKAdder, ported version of OpenJDK9 LongAdder, package also provides other alternatives for various use cases.
Package gocroaring is an wrapper for CRoaring in go It provides a fast compressed bitmap data structure. See http://roaringbitmap.org for details.
Package dhall implements routines for deserializing and evaluating Dhall configuration into Go data structures. For more on the Dhall language, see https://dhall-lang.org/ If you have the following struct: And a file called foo.dhall containing the following Dhall configuration: You can deserialize the Dhall into the struct as follows (error handling skipped for brevity): This version supports Dhall standard 17.0.0, except that it doesn't support `using` directives.
Package desync implements data structures, protocols and features of https://github.com/systemd/casync in order to allow support for additional platforms and improve performace by way of concurrency and caching. Supports the following casync data structures: catar archives, caibx/caidx index files, castr stores (local or remote). See desync/cmd for reference implementations of the available features.
Package errcode facilitates standardized API error codes. The goal is that clients can reliably understand errors by checking against immutable error codes This godoc documents usage. For broader context, see https://github.com/pingcap/errcode/tree/master/README.md Error codes are represented as strings by CodeStr (see CodeStr documentation). This package is designed to have few opinions and be a starting point for how you want to do errors in your project. The main requirement is to satisfy the ErrorCode interface by attaching a Code to an Error. See the documentation of ErrorCode. Additional optional interfaces HasClientData, HasOperation, Causer, and StackTracer are provided for extensibility in creating structured error data representations. Hierarchies are supported: a Code can point to a parent. This is used in the HTTPCode implementation to inherit HTTP codes found with MetaDataFromAncestors. The hierarchy is present in the Code's string representation with a dot separation. A few generic top-level error codes are provided (see the variables section of the doc). You are encouraged to create your own error codes customized to your application rather than solely using generic errors. See NewJSONFormat for an opinion on how to send back meta data about errors with the error data to a client. JSONFormat includes a body of response data (the "data field") that is by default the data from the Error serialized to JSON. Stack traces are automatically added by NewInternalErr and show up as the Stack field in JSONFormat. Errors can be grouped with Combine() and ungrouped via Errors() which show up as the Others field in JSONFormat. To extract any ErrorCodes from an error, use CodeChain(). This extracts error codes without information loss (using ChainContext).
Package bitset provides bitset implementations for bit packing binary values into pointers and bytes. A bitset, while logically equivalent to a []bool, is often preferable over a []bool due to the space and time efficiency of bit packing binary values. They are typically more space efficient than a []bool since (although implementation specifc) bools are typically byte size. Using a bitset in place of a []bool can therefore result in at least an 8x reduction in memory usage. While bitsets introduce bitshifting overhead for gets and sets unnecessary for a []bool, they may still more performant than a []bool due to the smaller data structure being more cache friendly. This package contains three bitset implementations: Pointers for efficiency, Bytes for situations where bitsets must be serialized or deserialized, and Sparse for when memory efficiency is the most important factor when working with sparse datasets.
File Structures 2 This is a follow up to my crufty http://github.com/timtadh/file-structures work. That system has some endemic problems: 1. It uses the read/write interface to files. This means that it needs to do block management and cache management. In theory this can be very fast but it is also very challenging in Go. 2. Because it uses read/write it has to do buffer management. Largely, the system punts on this problem and allows go to handle the buffer management through the normal memory management system. This doesn't work especially well for the use case of file-structures. File Structures 2 is an experiment to bring Memory Mapped IO to the world of Go. The hypotheses are: 1. The operating system is good at page management generally. While, we know more about how to manage the structure of B+Trees, VarChar stores, and Linear Hash tables than the OS there is no indication that from Go you can acheive better performance. Therefore, I hypothesize that leaving it to the OS will lead to a smaller working set and a faster data structure in general. 2. You can make Memory Mapping performant in Go. There are many challenges here. The biggest of which is that there are no dynamically size array TYPES in go. The size of the array is part of the type, you have to use slices. This creates complications when hooking up structures which contain slices to mmap allocated blocks of memory. I hypothesize that this repository can acheive good (enough) performance here. The major components of this project: 1. fmap - a memory mapped file inteface. Part C part Go. Uses cgo. 2. bptree - a B+ Tree with duplicate key support (fixed size keys, variable length values) written on top of fmap. 3. slice - used by fmap and bptree to completely violate memory and type safety of Go. 4. errors - just a simple error package which maintains a stack trace with every error.
Package lookslike is used to validate JSON-like nested map data-structure against a set of expectations. Its key features are allowing custom, function defined validators for values, and allowing the composition of multiple validation specs. See the example below for more details. Most key functions include detailed examples of their use within this documentation.
Package squalor provides SQL utility routines, such as validation of models against table schemas, marshalling and unmarshalling of model structs into table rows and programmatic construction of SQL statements. While squalor uses the database/sql package, the SQL it utilizes is MySQL specific (e.g. REPLACE, INSERT ON DUPLICATE KEY UPDATE, etc). Given a simple table definition: And a model for a row of the table: The BindModel method will validate that the model and the table definition are compatible. For example, it would be an error for the User.ID field to have the type string. The table definition is loaded from the database allowing custom checks such as verifying the existence of indexes. While fields are often primitive types such as int64 or string, it is sometimes desirable to use a custom type. This can be accomplished by having the type implement the sql.Scanner and driver.Valuer interfaces. For example, you might create a GzippedText type which automatically compresses the value when it is written to the database and uncompressed when it is read: The Insert method is used to insert one or more rows in a table. You can pass either a struct or a pointer to a struct. Multiple rows can be inserted at once. Doing so offers convenience and performance. When multiple rows are batch inserted the returned error corresponds to the first SQL error encountered which may make it impossible to determine which row caused the error. If the rows correspond to different tables the order of insertion into the tables is undefined. If you care about the order of insertion into multiple tables and determining which row is causing an insertion error, structure your calls to Insert appropriately (i.e. insert into a single table or insert a single row). After a successful insert on a table with an auto increment primary key, the auto increment will be set back in the corresponding field of the object. For example, if the user table was defined as: Then we could create a new user by doing: The Replace method replaces a row in table, either inserting the row if it doesn't exist, or deleting and then inserting the row. See the MySQL docs for the difference between REPLACE, UPDATE and INSERT ON DUPLICATE KEY UPDATE (Upsert). The Update method updates a row in a table, returning the number of rows modified. The Upsert method inserts or updates a row. The Get method retrieves a single row by primary key and binds the result columns to a struct. The delete method deletes rows by primary key. See the documentation for DB.Delete for performance limitations when batch deleting multiple rows from a table with a primary key composed of multiple columns. To support optimistic locking with a column storing the version number, one field in a model object can be marked to serve as the lock. Modifying the example above: Now, the Update method will ensure that the object has not been concurrently modified when writing, by constraining the update by the version number. If the update is successful, the version number will be both incremented on the model (in-memory), as well as in the database. Programmatic construction of SQL queries prohibits SQL injection attacks while keeping query construction both readable and similar to SQL itself. Support is provided for most of the SQL DML (data manipulation language), though the constructed queries are targetted to MySQL. DB provides wrappers for the sql.Query and sql.QueryRow interfaces. The Rows struct returned from Query has an additional StructScan method for binding the columns for a row result to a struct. QueryRow can be used to easily query and scan a single value: In addition to the convenience of inserting, deleting and updating table rows using a struct, attention has been paid to performance. Since squalor is carefully constructing the SQL queries for insertion, deletion and update, it eschews the use of placeholders in favor of properly escaping values in the query. With the Go MySQL drivers, this saves a roundtrip to the database for each query because queries with placeholders must be prepared before being executed. Marshalling and unmarshalling of data from structs utilizes reflection, but care is taken to minimize the use of reflection in order to improve performance. For example, reflection data for models is cached when the model is bound to a table. Batch operations are utilized when possible. The Delete, Insert, Replace, Update and Upsert operations will perform multiple operations per SQL statement. This can provide an order of magnitude speed improvement over performing the mutations one row at a time due to minimizing the network overhead.
Package dataexchange provides the API client, operations, and parameter types for AWS Data Exchange. AWS Data Exchange is a service that makes it easy for AWS customers to exchange data in the cloud. You can use the AWS Data Exchange APIs to create, update, manage, and access file-based data set in the AWS Cloud. As a subscriber, you can view and access the data sets that you have an entitlement to through a subscription. You can use the APIs to download or copy your entitled data sets to Amazon Simple Storage Service (Amazon S3) for use across a variety of AWS analytics and machine learning services. As a provider, you can create and manage your data sets that you would like to publish to a product. Being able to package and provide your data sets into products requires a few steps to determine eligibility. For more information, visit the AWS Data Exchange User Guide. A data set is a collection of data that can be changed or updated over time. Data sets can be updated using revisions, which represent a new version or incremental change to a data set. A revision contains one or more assets. An asset in AWS Data Exchange is a piece of data that can be stored as an Amazon S3 object, Redshift datashare, API Gateway API, AWS Lake Formation data permission, or Amazon S3 data access. The asset can be a structured data file, an image file, or some other data file. Jobs are asynchronous import or export operations used to create or copy assets.
Package rencode is a Go implementation of https://github.com/aresch/rencode The rencode logic is similar to bencode (https://en.wikipedia.org/wiki/Bencode). For complex, heterogeneous data structures with many small elements, r-encodings take up significantly less space than b-encodings. Example of encoder construction and use: You can use either specific methods to encode one of the supported types, or the interface-generic Encode() method. Example of decoder construction: The DecodeNext() method can be used to decode the next value from the rencode stream; however this method returns an interface{} while it is usually the norm that there is an expected type instead; in such cases, it is advised to use the Scan() method instead, which accepts a pointer to any of the supported types. Example: Only the following types are supported: The rencode.List and rencode.Dictionary implement Python-alike features and can store values and keys of the simpler types enumerated above.
Package siphash implements the SipHash-64 and SipHash-128 pseudo-random-functions - with the recommended parameters: c = 2 and d = 4. SipHash computes a message authentication code (MAC) from a variable-length message and a 128 bit secret key. SipHash was designed to be efficient, even for short inputs, with performance comparable to non-cryptographic hash functions. SipHash cannot be used as a cryptographic hash function. Neither SipHash-64 nor SipHash-128 are strong collision resistant. SipHash was designed to defend hash flooding DoS attacks. SipHash-64 can be used as hashing scheme within hash maps or other key-value data structures. SipHash-128 can be used to compute a 128 bit authentication tag for messages.
Package merkletree implements a Merkle Tree capable of storing arbitrary content. A Merkle Tree is a hash tree that provides an efficient way to verify the contents of a set data are present and untampered with. At its core, a Merkle Tree is a list of items representing the data that should be verified. Each of these items is inserted into a leaf node and a tree of hashes is constructed bottom up using a hash of the nodes left and right children's hashes. This means that the root node will effictively be a hash of all other nodes (hashes) in the tree. This property allows the tree to be reproduced and thus verified by on the hash of the root node of the tree. The benefit of the tree structure is verifying any single content entry in the tree will require only nlog2(n) steps in the worst case. Creating a new merkletree requires that the type that the tree will be constructed from implements the Content interface. A slice of the Content items should be created and then passed to the NewTree method. t represents the Merkle Tree and can be verified and manipulated with the API methods described below.
Package xkafka provides consumers & producers to work efficiently with Kafka. `Message` is the core data structure for xkafka. `xkafka` comes with implementations that support: ## Consumer The processing mode is determined by the xkafka.Concurrency option. By default, the consumer is initialized with `enable.auto.offset.store=false`. The offset is "stored" after the message is processed. The offset is "committed" based on the `auto.commit.interval.ms` options. It is important to understand the difference between "store" and "commit". The offset is "stored" in the consumer's memory and is "committed" to Kafka. The offset is "stored" after the message is processed and the `message.Status` is Success or Skip. The stored offsets will be automatically committed, unless the `ManualCommit` option is enabled. ### Error Handling By default, xkafka.Consumer will stop processing, commit last stored offset, and exit if there is a Kafka error or if the handler returns an error. Errors can be handled by using one or more of the following options: Within the handler implementation Using error handling & retry middlewares through the catch all xkafka.ErrorHandler option xkafka.ErrorHandler is called for every error that is not handled by the handler or the middlewares. It is also called for errors returned by underlying Kafka client. ### Sequential Processing Sequential processing is the default mode. It is same as xkafka.Concurrency(1). ### Async Processing Async processing is enabled by setting xkafka.Concurrency to a value greater than 1. The consumer will use a pool of Go routines to process messages concurrently. Offsets are stored and committed in the order that the messages are received. ### Manual Commit By default, the consumer will automatically commit the offset based on the `auto.commit.interval.ms` option, asynchronously in the background. The consumer can be configured to commit the offset manually by setting `xkafka.EnableManualCommit` option to true. When ManualCommit is enabled, the consumer will synchronously commit the offset after each message is processed. NOTE: Enabling ManualCommit will add an overhead to each message. It is recommended to use ManualCommit only when necessary.
Package json implements semantic processing of JSON as specified in RFC 8259. JSON is a simple data interchange format that can represent primitive data types such as booleans, strings, and numbers, in addition to structured data types such as objects and arrays. Marshal and Unmarshal encode and decode Go values to/from JSON text contained within a []byte. MarshalWrite and UnmarshalRead operate on JSON text by writing to or reading from an io.Writer or io.Reader. MarshalEncode and UnmarshalDecode operate on JSON text by encoding to or decoding from a jsontext.Encoder or jsontext.Decoder. Options may be passed to each of the marshal or unmarshal functions to configure the semantic behavior of marshaling and unmarshaling (i.e., alter how JSON data is understood as Go data and vice versa). jsontext.Options may also be passed to the marshal or unmarshal functions to configure the syntactic behavior of encoding or decoding. The data types of JSON are mapped to/from the data types of Go based on the closest logical equivalent between the two type systems. For example, a JSON boolean corresponds with a Go bool, a JSON string corresponds with a Go string, a JSON number corresponds with a Go int, uint or float, a JSON array corresponds with a Go slice or array, and a JSON object corresponds with a Go struct or map. See the documentation on Marshal and Unmarshal for a comprehensive list of how the JSON and Go type systems correspond. Arbitrary Go types can customize their JSON representation by implementing MarshalerV1, MarshalerV2, UnmarshalerV1, or UnmarshalerV2. This provides authors of Go types with control over how their types are serialized as JSON. Alternatively, users can implement functions that match MarshalFuncV1, MarshalFuncV2, UnmarshalFuncV1, or UnmarshalFuncV2 to specify the JSON representation for arbitrary types. This provides callers of JSON functionality with control over how any arbitrary type is serialized as JSON. A Go struct is naturally represented as a JSON object, where each Go struct field corresponds with a JSON object member. When marshaling, all Go struct fields are recursively encoded in depth-first order as JSON object members except those that are ignored or omitted. When unmarshaling, JSON object members are recursively decoded into the corresponding Go struct fields. Object members that do not match any struct fields, also known as “unknown members”, are ignored by default or rejected if RejectUnknownMembers is specified. The representation of each struct field can be customized in the "json" struct field tag, where the tag is a comma separated list of options. As a special case, if the entire tag is `json:"-"`, then the field is ignored with regard to its JSON representation. The first option is the JSON object name override for the Go struct field. If the name is not specified, then the Go struct field name is used as the JSON object name. JSON names containing commas or quotes, or names identical to "" or "-", can be specified using a single-quoted string literal, where the syntax is identical to the Go grammar for a double-quoted string literal, but instead uses single quotes as the delimiters. By default, unmarshaling uses case-sensitive matching to identify the Go struct field associated with a JSON object name. After the name, the following tag options are supported: omitzero: When marshaling, the "omitzero" option specifies that the struct field should be omitted if the field value is zero as determined by the "IsZero() bool" method if present, otherwise based on whether the field is the zero Go value. This option has no effect when unmarshaling. omitempty: When marshaling, the "omitempty" option specifies that the struct field should be omitted if the field value would have been encoded as a JSON null, empty string, empty object, or empty array. This option has no effect when unmarshaling. string: The "string" option specifies that StringifyNumbers be set when marshaling or unmarshaling a struct field value. This causes numeric types to be encoded as a JSON number within a JSON string, and to be decoded from either a JSON number or a JSON string containing a JSON number. This extra level of encoding is often necessary since many JSON parsers cannot precisely represent 64-bit integers. nocase: When unmarshaling, the "nocase" option specifies that if the JSON object name does not exactly match the JSON name for any of the struct fields, then it attempts to match the struct field using a case-insensitive match that also ignores dashes and underscores. If multiple fields match, the first declared field in breadth-first order takes precedence. This takes precedence even if MatchCaseInsensitiveNames is set to false. This cannot be specified together with the "strictcase" option. strictcase: When unmarshaling, the "strictcase" option specifies that the JSON object name must exactly match the JSON name for the struct field. This takes precedence even if MatchCaseInsensitiveNames is set to true. This cannot be specified together with the "nocase" option. inline: The "inline" option specifies that the JSON representable content of this field type is to be promoted as if they were specified in the parent struct. It is the JSON equivalent of Go struct embedding. A Go embedded field is implicitly inlined unless an explicit JSON name is specified. The inlined field must be a Go struct (that does not implement any JSON methods), jsontext.Value, map[string]T, or an unnamed pointer to such types. When marshaling, inlined fields from a pointer type are omitted if it is nil. Inlined fields of type jsontext.Value and map[string]T are called “inlined fallbacks” as they can represent all possible JSON object members not directly handled by the parent struct. Only one inlined fallback field may be specified in a struct, while many non-fallback fields may be specified. This option must not be specified with any other option (including the JSON name). unknown: The "unknown" option is a specialized variant of the inlined fallback to indicate that this Go struct field contains any number of unknown JSON object members. The field type must be a jsontext.Value, map[string]T, or an unnamed pointer to such types. If DiscardUnknownMembers is specified when marshaling, the contents of this field are ignored. If RejectUnknownMembers is specified when unmarshaling, any unknown object members are rejected regardless of whether an inlined fallback with the "unknown" option exists. This option must not be specified with any other option (including the JSON name). format: The "format" option specifies a format flag used to specialize the formatting of the field value. The option is a key-value pair specified as "format:value" where the value must be either a literal consisting of letters and numbers (e.g., "format:RFC3339") or a single-quoted string literal (e.g., "format:'2006-01-02'"). The interpretation of the format flag is determined by the struct field type. The "omitzero" and "omitempty" options are mostly semantically identical. The former is defined in terms of the Go type system, while the latter in terms of the JSON type system. Consequently they behave differently in some circumstances. For example, only a nil slice or map is omitted under "omitzero", while an empty slice or map is omitted under "omitempty" regardless of nilness. The "omitzero" option is useful for types with a well-defined zero value (e.g., net/netip.Addr) or have an IsZero method (e.g., time.Time.IsZero). Every Go struct corresponds to a list of JSON representable fields which is constructed by performing a breadth-first search over all struct fields (excluding unexported or ignored fields), where the search recursively descends into inlined structs. The set of non-inlined fields in a struct must have unique JSON names. If multiple fields all have the same JSON name, then the one at shallowest depth takes precedence and the other fields at deeper depths are excluded from the list of JSON representable fields. If multiple fields at the shallowest depth have the same JSON name, but exactly one is explicitly tagged with a JSON name, then that field takes precedence and all others are excluded from the list. This is analogous to Go visibility rules for struct field selection with embedded struct types. Marshaling or unmarshaling a non-empty struct without any JSON representable fields results in a SemanticError. Unexported fields must not have any `json` tags except for `json:"-"`. Unmarshal matches JSON object names with Go struct fields using a case-sensitive match, but can be configured to use a case-insensitive match with the "nocase" option. This permits unmarshaling from inputs that use naming conventions such as camelCase, snake_case, or kebab-case. By default, JSON object names for Go struct fields are derived from the Go field name, but may be specified in the `json` tag. Due to JSON's heritage in JavaScript, the most common naming convention used for JSON object names is camelCase. The "format" tag option can be used to alter the formatting of certain types. JSON objects can be inlined within a parent object similar to how Go structs can be embedded within a parent struct. The inlining rules are similar to those of Go embedding, but operates upon the JSON namespace. Go struct fields can be omitted from the output depending on either the input Go value or the output JSON encoding of the value. The "omitzero" option omits a field if it is the zero Go value or implements a "IsZero() bool" method that reports true. The "omitempty" option omits a field if it encodes as an empty JSON value, which we define as a JSON null or empty JSON string, object, or array. In many cases, the behavior of "omitzero" and "omitempty" are equivalent. If both provide the desired effect, then using "omitzero" is preferred. The exact order of JSON object can be preserved through the use of a specialized type that implements MarshalerV2 and UnmarshalerV2. Some Go types have a custom JSON representation where the implementation is delegated to some external package. Consequently, the "json" package will not know how to use that external implementation. For example, the google.golang.org/protobuf/encoding/protojson package implements JSON for all google.golang.org/protobuf/proto.Message types. WithMarshalers and WithUnmarshalers can be used to configure "json" and "protojson" to cooperate together. When implementing HTTP endpoints, it is common to be operating with an io.Reader and an io.Writer. The MarshalWrite and UnmarshalRead functions assist in operating on such input/output types. UnmarshalRead reads the entirety of the io.Reader to ensure that io.EOF is encountered without any unexpected bytes after the top-level JSON value. If a type implements encoding.TextMarshaler and/or encoding.TextUnmarshaler, then the MarshalText and UnmarshalText methods are used to encode/decode the value to/from a JSON string. Due to version skew, the set of JSON object members known at compile-time may differ from the set of members encountered at execution-time. As such, it may be useful to have finer grain handling of unknown members. This package supports preserving, rejecting, or discarding such members.
Package gocql implements a fast and robust Cassandra driver for the Go programming language. Pass a list of initial node IP addresses to NewCluster to create a new cluster configuration: Port can be specified as part of the address, the above is equivalent to: It is recommended to use the value set in the Cassandra config for broadcast_address or listen_address, an IP address not a domain name. This is because events from Cassandra will use the configured IP address, which is used to index connected hosts. If the domain name specified resolves to more than 1 IP address then the driver may connect multiple times to the same host, and will not mark the node being down or up from events. Then you can customize more options (see ClusterConfig): The driver tries to automatically detect the protocol version to use if not set, but you might want to set the protocol version explicitly, as it's not defined which version will be used in certain situations (for example during upgrade of the cluster when some of the nodes support different set of protocol versions than other nodes). The driver advertises the module name and version in the STARTUP message, so servers are able to detect the version. If you use replace directive in go.mod, the driver will send information about the replacement module instead. When ready, create a session from the configuration. Don't forget to Close the session once you are done with it: CQL protocol uses a SASL-based authentication mechanism and so consists of an exchange of server challenges and client response pairs. The details of the exchanged messages depend on the authenticator used. To use authentication, set ClusterConfig.Authenticator or ClusterConfig.AuthProvider. PasswordAuthenticator is provided to use for username/password authentication: It is possible to secure traffic between the client and server with TLS. To use TLS, set the ClusterConfig.SslOpts field. SslOptions embeds *tls.Config so you can set that directly. There are also helpers to load keys/certificates from files. Warning: Due to historical reasons, the SslOptions is insecure by default, so you need to set EnableHostVerification to true if no Config is set. Most users should set SslOptions.Config to a *tls.Config. SslOptions and Config.InsecureSkipVerify interact as follows: For example: To route queries to local DC first, use DCAwareRoundRobinPolicy. For example, if the datacenter you want to primarily connect is called dc1 (as configured in the database): The driver can route queries to nodes that hold data replicas based on partition key (preferring local DC). Note that TokenAwareHostPolicy can take options such as gocql.ShuffleReplicas and gocql.NonLocalReplicasFallback. We recommend running with a token aware host policy in production for maximum performance. The driver can only use token-aware routing for queries where all partition key columns are query parameters. For example, instead of use The DCAwareRoundRobinPolicy can be replaced with RackAwareRoundRobinPolicy, which takes two parameters, datacenter and rack. Instead of dividing hosts with two tiers (local datacenter and remote datacenters) it divides hosts into three (the local rack, the rest of the local datacenter, and everything else). RackAwareRoundRobinPolicy can be combined with TokenAwareHostPolicy in the same way as DCAwareRoundRobinPolicy. Create queries with Session.Query. Query values must not be reused between different executions and must not be modified after starting execution of the query. To execute a query without reading results, use Query.Exec: Single row can be read by calling Query.Scan: Multiple rows can be read using Iter.Scanner: See Example for complete example. The driver automatically prepares DML queries (SELECT/INSERT/UPDATE/DELETE/BATCH statements) and maintains a cache of prepared statements. CQL protocol does not support preparing other query types. When using CQL protocol >= 4, it is possible to use gocql.UnsetValue as the bound value of a column. This will cause the database to ignore writing the column. The main advantage is the ability to keep the same prepared statement even when you don't want to update some fields, where before you needed to make another prepared statement. Session is safe to use from multiple goroutines, so to execute multiple concurrent queries, just execute them from several worker goroutines. Gocql provides synchronously-looking API (as recommended for Go APIs) and the queries are executed asynchronously at the protocol level. Null values are are unmarshalled as zero value of the type. If you need to distinguish for example between text column being null and empty string, you can unmarshal into *string variable instead of string. See Example_nulls for full example. The driver reuses backing memory of slices when unmarshalling. This is an optimization so that a buffer does not need to be allocated for every processed row. However, you need to be careful when storing the slices to other memory structures. When you want to save the data for later use, pass a new slice every time. A common pattern is to declare the slice variable within the scanner loop: The driver supports paging of results with automatic prefetch, see ClusterConfig.PageSize, Session.SetPrefetch, Query.PageSize, and Query.Prefetch. It is also possible to control the paging manually with Query.PageState (this disables automatic prefetch). Manual paging is useful if you want to store the page state externally, for example in a URL to allow users browse pages in a result. You might want to sign/encrypt the paging state when exposing it externally since it contains data from primary keys. Paging state is specific to the CQL protocol version and the exact query used. It is meant as opaque state that should not be modified. If you send paging state from different query or protocol version, then the behaviour is not defined (you might get unexpected results or an error from the server). For example, do not send paging state returned by node using protocol version 3 to a node using protocol version 4. Also, when using protocol version 4, paging state between Cassandra 2.2 and 3.0 is incompatible (https://issues.apache.org/jira/browse/CASSANDRA-10880). The driver does not check whether the paging state is from the same protocol version/statement. You might want to validate yourself as this could be a problem if you store paging state externally. For example, if you store paging state in a URL, the URLs might become broken when you upgrade your cluster. Call Query.PageState(nil) to fetch just the first page of the query results. Pass the page state returned by Iter.PageState to Query.PageState of a subsequent query to get the next page. If the length of slice returned by Iter.PageState is zero, there are no more pages available (or an error occurred). Using too low values of PageSize will negatively affect performance, a value below 100 is probably too low. While Cassandra returns exactly PageSize items (except for last page) in a page currently, the protocol authors explicitly reserved the right to return smaller or larger amount of items in a page for performance reasons, so don't rely on the page having the exact count of items. See Example_paging for an example of manual paging. There are certain situations when you don't know the list of columns in advance, mainly when the query is supplied by the user. Iter.Columns, Iter.RowData, Iter.MapScan and Iter.SliceMap can be used to handle this case. See Example_dynamicColumns. The CQL protocol supports sending batches of DML statements (INSERT/UPDATE/DELETE) and so does gocql. Use Session.NewBatch to create a new batch and then fill-in details of individual queries. Then execute the batch with Session.ExecuteBatch. Logged batches ensure atomicity, either all or none of the operations in the batch will succeed, but they have overhead to ensure this property. Unlogged batches don't have the overhead of logged batches, but don't guarantee atomicity. Updates of counters are handled specially by Cassandra so batches of counter updates have to use CounterBatch type. A counter batch can only contain statements to update counters. For unlogged batches it is recommended to send only single-partition batches (i.e. all statements in the batch should involve only a single partition). Multi-partition batch needs to be split by the coordinator node and re-sent to correct nodes. With single-partition batches you can send the batch directly to the node for the partition without incurring the additional network hop. It is also possible to pass entire BEGIN BATCH .. APPLY BATCH statement to Query.Exec. There are differences how those are executed. BEGIN BATCH statement passed to Query.Exec is prepared as a whole in a single statement. Session.ExecuteBatch prepares individual statements in the batch. If you have variable-length batches using the same statement, using Session.ExecuteBatch is more efficient. See Example_batch for an example. Query.ScanCAS or Query.MapScanCAS can be used to execute a single-statement lightweight transaction (an INSERT/UPDATE .. IF statement) and reading its result. See example for Query.MapScanCAS. Multiple-statement lightweight transactions can be executed as a logged batch that contains at least one conditional statement. All the conditions must return true for the batch to be applied. You can use Session.ExecuteBatchCAS and Session.MapExecuteBatchCAS when executing the batch to learn about the result of the LWT. See example for Session.MapExecuteBatchCAS. Queries can be marked as idempotent. Marking the query as idempotent tells the driver that the query can be executed multiple times without affecting its result. Non-idempotent queries are not eligible for retrying nor speculative execution. Idempotent queries are retried in case of errors based on the configured RetryPolicy. If the query is LWT and the configured RetryPolicy additionally implements LWTRetryPolicy interface, then the policy will be cast to LWTRetryPolicy and used this way. Queries can be retried even before they fail by setting a SpeculativeExecutionPolicy. The policy can cause the driver to retry on a different node if the query is taking longer than a specified delay even before the driver receives an error or timeout from the server. When a query is speculatively executed, the original execution is still executing. The two parallel executions of the query race to return a result, the first received result will be returned. UDTs can be mapped (un)marshaled from/to map[string]interface{} a Go struct (or a type implementing UDTUnmarshaler, UDTMarshaler, Unmarshaler or Marshaler interfaces). For structs, cql tag can be used to specify the CQL field name to be mapped to a struct field: See Example_userDefinedTypesMap, Example_userDefinedTypesStruct, ExampleUDTMarshaler, ExampleUDTUnmarshaler. It is possible to provide observer implementations that could be used to gather metrics: CQL protocol also supports tracing of queries. When enabled, the database will write information about internal events that happened during execution of the query. You can use Query.Trace to request tracing and receive the session ID that the database used to store the trace information in system_traces.sessions and system_traces.events tables. NewTraceWriter returns an implementation of Tracer that writes the events to a writer. Gathering trace information might be essential for debugging and optimizing queries, but writing traces has overhead, so this feature should not be used on production systems with very high load unless you know what you are doing. Example_batch demonstrates how to execute a batch of statements. Example_dynamicColumns demonstrates how to handle dynamic column list. Example_marshalerUnmarshaler demonstrates how to implement a Marshaler and Unmarshaler. Example_nulls demonstrates how to distinguish between null and zero value when needed. Null values are unmarshalled as zero value of the type. If you need to distinguish for example between text column being null and empty string, you can unmarshal into *string field. Example_paging demonstrates how to manually fetch pages and use page state. See also package documentation about paging. Example_set demonstrates how to use sets. Example_userDefinedTypesMap demonstrates how to work with user-defined types as maps. See also Example_userDefinedTypesStruct and examples for UDTMarshaler and UDTUnmarshaler if you want to map to structs. Example_userDefinedTypesStruct demonstrates how to work with user-defined types as structs. See also examples for UDTMarshaler and UDTUnmarshaler if you need more control/better performance.