Hashing

github.com/rafaeljusto/redigomock/v3

Package redigomock is a mock for redigo library (redis client) Redigomock basically register the commands with the expected results in a internal global variable. When the command is executed via Conn interface, the mock will look to this global variable to retrieve the corresponding result. To start a mocked connection just do the following: Now you can inject it whenever your system needs a redigo.Conn because it satisfies all interface requirements. Before running your tests you need beyond of mocking the connection, registering the expected results. For that you can generate commands with the expected results. As the Expect method from Command receives anything (interface{}), another method was created to easy map the result to your structure. For that use ExpectMap: You should also test the error cases, and you can do it in the same way of a normal result. Sometimes you will want to register a command regardless the arguments, and you can do it with the method GenericCommand (mainly with the HMSET). All commands are registered in a global variable, so they will be there until all your test cases ends. So for good practice in test writing you should in the beginning of each test case clear the mock states. Let's see a full test example. Imagine a Person structure and a function that pick up this person in Redis using redigo library (file person.go): Now we need to test it, so let's create the corresponding test with redigomock (fileperson_test.go): When you use redis as a persistent list, then you might want to call the same redis command multiple times. For example: To test it, you can chain redis responses. Let's write a test case: In the first iteration of the loop redigomock would return "www.some.url.com", then "www.another.url.com" and finally redis.ErrNil. Sometimes providing expected arguments to redigomock at compile time could be too constraining. Let's imagine you use redis hash sets to store some data, along with the timestamp of the last data update. Let's expand our Person struct: And add a function updating personal data (phone number for example). Please notice that the update timestamp can't be determined at compile time: Unit test: As you can see at the position of current timestamp redigomock is told to match AnyInt struct created by NewAnyInt() method. AnyInt struct will match any integer passed to redigomock from the tested method. Please see fuzzyMatch.go file for more details. The interface of Conn which matches redigo.Conn is safe for concurrent use, but the mock-only methods and fields, like Command and Errors, should not be accessed concurrently with such calls.

v3.1.2 • last year

github.com/lithammer/go-jump-consistent-hash

Package jump implements the "jump consistent hash" algorithm. Example Reference C++ implementation[1] Jump consistent hash works by computing when its output changes as the number of buckets increases. Let ch(key, num_buckets) be the consistent hash for the key when there are num_buckets buckets. Clearly, for any key, k, ch(k, 1) is 0, since there is only the one bucket. In order for the consistent hash function to balanced, ch(k, 2) will have to stay at 0 for half the keys, k, while it will have to jump to 1 for the other half. In general, ch(k, n+1) has to stay the same as ch(k, n) for n/(n+1) of the keys, and jump to n for the other 1/(n+1) of the keys. Here are examples of the consistent hash values for three keys, k1, k2, and k3, as num_buckets goes up: A linear time algorithm can be defined by using the formula for the probability of ch(key, j) jumping when j increases. It essentially walks across a row of this table. Given a key and number of buckets, the algorithm considers each successive bucket, j, from 1 to num_buckets1, and uses ch(key, j) to compute ch(key, j+1). At each bucket, j, it decides whether to keep ch(k, j+1) the same as ch(k, j), or to jump its value to j. In order to jump for the right fraction of keys, it uses a pseudorandom number generator with the key as its seed. To jump for 1/(j+1) of keys, it generates a uniform random number between 0.0 and 1.0, and jumps if the value is less than 1/(j+1). At the end of the loop, it has computed ch(k, num_buckets), which is the desired answer. In code: We can convert this to a logarithmic time algorithm by exploiting that ch(key, j+1) is usually unchanged as j increases, only jumping occasionally. The algorithm will only compute the destinations of jumps the j’s for which ch(key, j+1) ≠ ch(key, j). Also notice that for these j’s, ch(key, j+1) = j. To develop the algorithm, we will treat ch(key, j) as a random variable, so that we can use the notation for random variables to analyze the fractions of keys for which various propositions are true. That will lead us to a closed form expression for a pseudorandom variable whose value gives the destination of the next jump. Suppose that the algorithm is tracking the bucket numbers of the jumps for a particular key, k. And suppose that b was the destination of the last jump, that is, ch(k, b) ≠ ch(k, b+1), and ch(k, b+1) = b. Now, we want to find the next jump, the smallest j such that ch(k, j+1) ≠ ch(k, b+1), or equivalently, the largest j such that ch(k, j) = ch(k, b+1). We will make a pseudorandom variable whose value is that j. To get a probabilistic constraint on j, note that for any bucket number, i, we have j ≥ i if and only if the consistent hash hasn’t changed by i, that is, if and only if ch(k, i) = ch(k, b+1). Hence, the distribution of j must satisfy Fortunately, it is easy to compute that probability. Notice that since P( ch(k, 10) = ch(k, 11) ) is 10/11, and P( ch(k, 11) = ch(k, 12) ) is 11/12, then P( ch(k, 10) = ch(k, 12) ) is 10/11 * 11/12 = 10/12. In general, if n ≥ m, P( ch(k, n) = ch(k, m) ) = m / n. Thus for any i > b, Now, we generate a pseudorandom variable, r, (depending on k and j) that is uniformly distributed between 0 and 1. Since we want P(j ≥ i) = (b+1) / i, we set P(j ≥ i) iff r ≤ (b+1) / i. Solving the inequality for i yields P(j ≥ i) iff i ≤ (b+1) / r. Since i is a lower bound on j, j will equal the largest i for which P(j ≥ i), thus the largest i satisfying i ≤ (b+1) / r. Thus, by the definition of the floor function, j = floor((b+1) / r). Using this formula, jump consistent hash finds ch(key, num_buckets) by choosing successive jump destinations until it finds a position at or past num_buckets. It then knows that the previous jump destination is the answer. To turn this into the actual code of figure 1, we need to implement random. We want it to be fast, and yet to also to have well distributed successive values. We use a 64bit linear congruential generator; the particular multiplier we use produces random numbers that are especially well distributed in higher dimensions (i.e., when successive random values are used to form tuples). We use the key as the seed. (For keys that don’t fit into 64 bits, a 64 bit hash of the key should be used.) The congruential generator updates the seed on each iteration, and the code derives a double from the current seed. Tests show that this generator has good speed and distribution. It is worth noting that unlike the algorithm of Karger et al., jump consistent hash does not require the key to be hashed if it is already an integer. This is because jump consistent hash has an embedded pseudorandom number generator that essentially rehashes the key on every iteration. The hash is not especially good (i.e., linear congruential), but since it is applied repeatedly, additional hashing of the input key is not necessary. [1] http://arxiv.org/pdf/1406.2294v1.pdf

v1.0.2 • 3 years ago

github.com/derailed/tview

Package tview implements rich widgets for terminal based user interfaces. The widgets provided with this package are useful for data exploration and data entry. The package implements the following widgets: The package also provides Application which is used to poll the event queue and draw widgets on screen. The following is a very basic example showing a box with the title "Hello, world!": First, we create a box primitive with a border and a title. Then we create an application, set the box as its root primitive, and run the event loop. The application exits when the application's Stop() function is called or when Ctrl-C is pressed. If we have a primitive which consumes key presses, we call the application's SetFocus() function to redirect all key presses to that primitive. Most primitives then offer ways to install handlers that allow you to react to any actions performed on them. You will find more demos in the "demos" subdirectory. It also contains a presentation (written using tview) which gives an overview of the different widgets and how they can be used. Throughout this package, colors are specified using the tcell.Color type. Functions such as tcell.GetColor(), tcell.NewHexColor(), and tcell.NewRGBColor() can be used to create colors from W3C color names or RGB values. Almost all strings which are displayed can contain color tags. Color tags are W3C color names or six hexadecimal digits following a hash tag, wrapped in square brackets. Examples: A color tag changes the color of the characters following that color tag. This applies to almost everything from box titles, list text, form item labels, to table cells. In a TextView, this functionality has to be switched on explicitly. See the TextView documentation for more information. Color tags may contain not just the foreground (text) color but also the background color and additional flags. In fact, the full definition of a color tag is as follows: Each of the three fields can be left blank and trailing fields can be omitted. (Empty square brackets "[]", however, are not considered color tags.) Colors that are not specified will be left unchanged. A field with just a dash ("-") means "reset to default". You can specify the following flags (some flags may not be supported by your terminal): Examples: In the rare event that you want to display a string such as "[red]" or "[#00ff1a]" without applying its effect, you need to put an opening square bracket before the closing square bracket. Note that the text inside the brackets will be matched less strictly than region or colors tags. I.e. any character that may be used in color or region tags will be recognized. Examples: You can use the Escape() function to insert brackets automatically where needed. When primitives are instantiated, they are initialized with colors taken from the global Styles variable. You may change this variable to adapt the look and feel of the primitives to your preferred style. This package supports unicode characters including wide characters. Many functions in this package are not thread-safe. For many applications, this may not be an issue: If your code makes changes in response to key events, it will execute in the main goroutine and thus will not cause any race conditions. If you access your primitives from other goroutines, however, you will need to synchronize execution. The easiest way to do this is to call Application.QueueUpdate() or Application.QueueUpdateDraw() (see the function documentation for details): One exception to this is the io.Writer interface implemented by TextView. You can safely write to a TextView from any goroutine. See the TextView documentation for details. You can also call Application.Draw() from any goroutine without having to wrap it in QueueUpdate(). And, as mentioned above, key event callbacks are executed in the main goroutine and thus should not use QueueUpdate() as that may lead to deadlocks. All widgets listed above contain the Box type. All of Box's functions are therefore available for all widgets, too. All widgets also implement the Primitive interface. The tview package is based on https://github.com/derailed/tcell. It uses types and constants from that package (e.g. colors and keyboard values). This package does not process mouse input (yet).

v0.8.3 • 5 months ago

github.com/filecoin-project/go-hamt-ipld/v3

Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin. HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID). The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive. Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order. This HAMT implementation: • Fixes the `bucketSize` to 3. • Defaults the `bitWidth` to 8, however within Filecoin it uses 5 • Defaults the hash algorithm to the 64-bit variant of Murmur3-x64 The algorithm used here is identical to that of the IPLD HashMap algorithm specified at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The specific parameters used by Filecoin and the DAG-CBOR block layout differ from the specification and are defined at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.

v3.3.0 • last year

github.com/filecoin-project/go-hamt-ipld/v2

Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin. HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID). The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive. Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order. This HAMT implementation: • Fixes the `bucketSize` to 3. • Defaults the `bitWidth` to 8, however within Filecoin it uses 5 • Defaults the hash algorithm to the 64-bit variant of Murmur3-x64 The algorithm used here is identical to that of the IPLD HashMap algorithm specified at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The specific parameters used by Filecoin and the DAG-CBOR block layout differ from the specification and are defined at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.

v2.0.0 • 4 years ago

git.sr.ht/~tslocum/cview

Package tview implements rich widgets for terminal based user interfaces. The widgets provided with this package are useful for data exploration and data entry. The package implements the following widgets: The package also provides Application which is used to poll the event queue and draw widgets on screen. The following is a very basic example showing a box with the title "Hello, world!": First, we create a box primitive with a border and a title. Then we create an application, set the box as its root primitive, and run the event loop. The application exits when the application's Stop() function is called or when Ctrl-C is pressed. If we have a primitive which consumes key presses, we call the application's SetFocus() function to redirect all key presses to that primitive. Most primitives then offer ways to install handlers that allow you to react to any actions performed on them. You will find more demos in the "demos" subdirectory. It also contains a presentation (written using tview) which gives an overview of the different widgets and how they can be used. Throughout this package, colors are specified using the tcell.Color type. Functions such as tcell.GetColor(), tcell.NewHexColor(), and tcell.NewRGBColor() can be used to create colors from W3C color names or RGB values. Almost all strings which are displayed can contain color tags. Color tags are W3C color names or six hexadecimal digits following a hash tag, wrapped in square brackets. Examples: A color tag changes the color of the characters following that color tag. This applies to almost everything from box titles, list text, form item labels, to table cells. In a TextView, this functionality has to be switched on explicitly. See the TextView documentation for more information. Color tags may contain not just the foreground (text) color but also the background color and additional flags. In fact, the full definition of a color tag is as follows: Each of the three fields can be left blank and trailing fields can be omitted. (Empty square brackets "[]", however, are not considered color tags.) Colors that are not specified will be left unchanged. A field with just a dash ("-") means "reset to default". You can specify the following flags (some flags may not be supported by your terminal): Examples: In the rare event that you want to display a string such as "[red]" or "[#00ff1a]" without applying its effect, you need to put an opening square bracket before the closing square bracket. Note that the text inside the brackets will be matched less strictly than region or colors tags. I.e. any character that may be used in color or region tags will be recognized. Examples: You can use the Escape() function to insert brackets automatically where needed. When primitives are instantiated, they are initialized with colors taken from the global Styles variable. You may change this variable to adapt the look and feel of the primitives to your preferred style. This package supports unicode characters including wide characters. Many functions in this package are not thread-safe. For many applications, this may not be an issue: If your code makes changes in response to key events, it will execute in the main goroutine and thus will not cause any race conditions. If you access your primitives from other goroutines, however, you will need to synchronize execution. The easiest way to do this is to call Application.QueueUpdate() or Application.QueueUpdateDraw() (see the function documentation for details): One exception to this is the io.Writer interface implemented by TextView. You can safely write to a TextView from any goroutine. See the TextView documentation for details. You can also call Application.Draw() from any goroutine without having to wrap it in QueueUpdate(). And, as mentioned above, key event callbacks are executed in the main goroutine and thus should not use QueueUpdate() as that may lead to deadlocks. All widgets listed above contain the Box type. All of Box's functions are therefore available for all widgets, too. All widgets also implement the Primitive interface. There is also the Focusable interface which is used to override functions in subclassing types. The tview package is based on https://github.com/diamondburned/tcell. It uses types and constants from that package (e.g. colors and keyboard values). This package does not process mouse input (yet).

v1.4.0 • 4 years ago

github.com/tsg/gopacket

Package gopacket provides packet decoding for the Go language. gopacket contains many sub-packages with additional functionality you may find useful, including: Also, if you're looking to dive right into code, see the examples subdirectory for numerous simple binaries built using gopacket libraries. gopacket takes in packet data as a []byte and decodes it into a packet with a non-zero number of "layers". Each layer corresponds to a protocol within the bytes. Once a packet has been decoded, the layers of the packet can be requested from the packet. Packets can be decoded from a number of starting points. Many of our base types implement Decoder, which allow us to decode packets for which we don't have full data. Most of the time, you won't just have a []byte of packet data lying around. Instead, you'll want to read packets in from somewhere (file, interface, etc) and process them. To do that, you'll want to build a PacketSource. First, you'll need to construct an object that implements the PacketDataSource interface. There are implementations of this interface bundled with gopacket in the gopacket/pcap and gopacket/pfring subpackages... see their documentation for more information on their usage. Once you have a PacketDataSource, you can pass it into NewPacketSource, along with a Decoder of your choice, to create a PacketSource. Once you have a PacketSource, you can read packets from it in multiple ways. See the docs for PacketSource for more details. The easiest method is the Packets function, which returns a channel, then asynchronously writes new packets into that channel, closing the channel if the packetSource hits an end-of-file. You can change the decoding options of the packetSource by setting fields in packetSource.DecodeOptions... see the following sections for more details. gopacket optionally decodes packet data lazily, meaning it only decodes a packet layer when it needs to handle a function call. Lazily-decoded packets are not concurrency-safe. Since layers have not all been decoded, each call to Layer() or Layers() has the potential to mutate the packet in order to decode the next layer. If a packet is used in multiple goroutines concurrently, don't use gopacket.Lazy. Then gopacket will decode the packet fully, and all future function calls won't mutate the object. By default, gopacket will copy the slice passed to NewPacket and store the copy within the packet, so future mutations to the bytes underlying the slice don't affect the packet and its layers. If you can guarantee that the underlying slice bytes won't be changed, you can use NoCopy to tell gopacket.NewPacket, and it'll use the passed-in slice itself. The fastest method of decoding is to use both Lazy and NoCopy, but note from the many caveats above that for some implementations either or both may be dangerous. During decoding, certain layers are stored in the packet as well-known layer types. For example, IPv4 and IPv6 are both considered NetworkLayer layers, while TCP and UDP are both TransportLayer layers. We support 4 layers, corresponding to the 4 layers of the TCP/IP layering scheme (roughly anagalous to layers 2, 3, 4, and 7 of the OSI model). To access these, you can use the packet.LinkLayer, packet.NetworkLayer, packet.TransportLayer, and packet.ApplicationLayer functions. Each of these functions returns a corresponding interface (gopacket.{Link,Network,Transport,Application}Layer). The first three provide methods for getting src/dst addresses for that particular layer, while the final layer provides a Payload function to get payload data. This is helpful, for example, to get payloads for all packets regardless of their underlying data type: A particularly useful layer is ErrorLayer, which is set whenever there's an error parsing part of the packet. Note that we don't return an error from NewPacket because we may have decoded a number of layers successfully before running into our erroneous layer. You may still be able to get your Ethernet and IPv4 layers correctly, even if your TCP layer is malformed. gopacket has two useful objects, Flow and Endpoint, for communicating in a protocol independent manner the fact that a packet is coming from A and going to B. The general layer types LinkLayer, NetworkLayer, and TransportLayer all provide methods for extracting their flow information, without worrying about the type of the underlying Layer. A Flow is a simple object made up of a set of two Endpoints, one source and one destination. It details the sender and receiver of the Layer of the Packet. An Endpoint is a hashable representation of a source or destination. For example, for LayerTypeIPv4, an Endpoint contains the IP address bytes for a v4 IP packet. A Flow can be broken into Endpoints, and Endpoints can be combined into Flows: Both Endpoint and Flow objects can be used as map keys, and the equality operator can compare them, so you can easily group together all packets based on endpoint criteria: For load-balancing purposes, both Flow and Endpoint have FastHash() functions, which provide quick, non-cryptographic hashes of their contents. Of particular importance is the fact that Flow FastHash() is symetric: A->B will have the same hash as B->A. An example usage could be: This allows us to split up a packet stream while still making sure that each stream sees all packets for a flow (and its bidirectional opposite). If your network has some strange encapsulation, you can implement your own decoder. In this example, we handle Ethernet packets which are encapsulated in a 4-byte header. See the docs for Decoder and PacketBuilder for more details on how coding decoders works, or look at RegisterLayerType and RegisterEndpointType to see how to add layer/endpoint types to gopacket. TLDR: DecodingLayerParser takes about 10% of the time as NewPacket to decode packet data, but only for known packet stacks. Basic decoding using gopacket.NewPacket or PacketSource.Packets is somewhat slow due to its need to allocate a new packet and every respective layer. It's very versatile and can handle all known layer types, but sometimes you really only care about a specific set of layers regardless, so that versatility is wasted. DecodingLayerParser avoids memory allocation altogether by decoding packet layers directly into preallocated objects, which you can then reference to get the packet's information. A quick example: The important thing to note here is that the parser is modifying the passed in layers (eth, ip4, ip6, tcp) instead of allocating new ones, thus greatly speeding up the decoding process. It's even branching based on layer type... it'll handle an (eth, ip4, tcp) or (eth, ip6, tcp) stack. However, it won't handle any other type... since no other decoders were passed in, an (eth, ip4, udp) stack will stop decoding after ip4, and only pass back [LayerTypeEthernet, LayerTypeIPv4] through the 'decoded' slice (along with an error saying it can't decode a UDP packet). Unfortunately, not all layers can be used by DecodingLayerParser... only those implementing the DecodingLayer interface are usable. Also, it's possible to create DecodingLayers that are not themselves Layers... see layers.IPv6ExtensionSkipper for an example of this. As well as offering the ability to decode packet data, gopacket will allow you to create packets from scratch, as well. A number of gopacket layers implement the SerializableLayer interface; these layers can be serialized to a []byte in the following manner: SerializeTo PREPENDS the given layer onto the SerializeBuffer, and they treat the current buffer's Bytes() slice as the payload of the serializing layer. Therefore, you can serialize an entire packet by serializing a set of layers in reverse order (Payload, then TCP, then IP, then Ethernet, for example). The SerializeBuffer's SerializeLayers function is a helper that does exactly that. To generate a (empty and useless, because no fields are set) Ethernet(IPv4(TCP(Payload))) packet, for exmaple, you can run: If you use gopacket, you'll almost definitely want to make sure gopacket/layers is imported, since when imported it sets all the LayerType variables and fills in a lot of interesting variables/maps (DecodersByLayerName, etc). Therefore, it's recommended that even if you don't use any layers functions directly, you still import with:

v0.0.0-20200626092518-2ab8e397a786 • 4 years ago

github.com/Peanuttown/gopacket

Package gopacket provides packet decoding for the Go language. gopacket contains many sub-packages with additional functionality you may find useful, including: Also, if you're looking to dive right into code, see the examples subdirectory for numerous simple binaries built using gopacket libraries. Minimum go version required is 1.5. gopacket takes in packet data as a []byte and decodes it into a packet with a non-zero number of "layers". Each layer corresponds to a protocol within the bytes. Once a packet has been decoded, the layers of the packet can be requested from the packet. Packets can be decoded from a number of starting points. Many of our base types implement Decoder, which allow us to decode packets for which we don't have full data. Most of the time, you won't just have a []byte of packet data lying around. Instead, you'll want to read packets in from somewhere (file, interface, etc) and process them. To do that, you'll want to build a PacketSource. First, you'll need to construct an object that implements the PacketDataSource interface. There are implementations of this interface bundled with gopacket in the gopacket/pcap and gopacket/pfring subpackages... see their documentation for more information on their usage. Once you have a PacketDataSource, you can pass it into NewPacketSource, along with a Decoder of your choice, to create a PacketSource. Once you have a PacketSource, you can read packets from it in multiple ways. See the docs for PacketSource for more details. The easiest method is the Packets function, which returns a channel, then asynchronously writes new packets into that channel, closing the channel if the packetSource hits an end-of-file. You can change the decoding options of the packetSource by setting fields in packetSource.DecodeOptions... see the following sections for more details. gopacket optionally decodes packet data lazily, meaning it only decodes a packet layer when it needs to handle a function call. Lazily-decoded packets are not concurrency-safe. Since layers have not all been decoded, each call to Layer() or Layers() has the potential to mutate the packet in order to decode the next layer. If a packet is used in multiple goroutines concurrently, don't use gopacket.Lazy. Then gopacket will decode the packet fully, and all future function calls won't mutate the object. By default, gopacket will copy the slice passed to NewPacket and store the copy within the packet, so future mutations to the bytes underlying the slice don't affect the packet and its layers. If you can guarantee that the underlying slice bytes won't be changed, you can use NoCopy to tell gopacket.NewPacket, and it'll use the passed-in slice itself. The fastest method of decoding is to use both Lazy and NoCopy, but note from the many caveats above that for some implementations either or both may be dangerous. During decoding, certain layers are stored in the packet as well-known layer types. For example, IPv4 and IPv6 are both considered NetworkLayer layers, while TCP and UDP are both TransportLayer layers. We support 4 layers, corresponding to the 4 layers of the TCP/IP layering scheme (roughly anagalous to layers 2, 3, 4, and 7 of the OSI model). To access these, you can use the packet.LinkLayer, packet.NetworkLayer, packet.TransportLayer, and packet.ApplicationLayer functions. Each of these functions returns a corresponding interface (gopacket.{Link,Network,Transport,Application}Layer). The first three provide methods for getting src/dst addresses for that particular layer, while the final layer provides a Payload function to get payload data. This is helpful, for example, to get payloads for all packets regardless of their underlying data type: A particularly useful layer is ErrorLayer, which is set whenever there's an error parsing part of the packet. Note that we don't return an error from NewPacket because we may have decoded a number of layers successfully before running into our erroneous layer. You may still be able to get your Ethernet and IPv4 layers correctly, even if your TCP layer is malformed. gopacket has two useful objects, Flow and Endpoint, for communicating in a protocol independent manner the fact that a packet is coming from A and going to B. The general layer types LinkLayer, NetworkLayer, and TransportLayer all provide methods for extracting their flow information, without worrying about the type of the underlying Layer. A Flow is a simple object made up of a set of two Endpoints, one source and one destination. It details the sender and receiver of the Layer of the Packet. An Endpoint is a hashable representation of a source or destination. For example, for LayerTypeIPv4, an Endpoint contains the IP address bytes for a v4 IP packet. A Flow can be broken into Endpoints, and Endpoints can be combined into Flows: Both Endpoint and Flow objects can be used as map keys, and the equality operator can compare them, so you can easily group together all packets based on endpoint criteria: For load-balancing purposes, both Flow and Endpoint have FastHash() functions, which provide quick, non-cryptographic hashes of their contents. Of particular importance is the fact that Flow FastHash() is symmetric: A->B will have the same hash as B->A. An example usage could be: This allows us to split up a packet stream while still making sure that each stream sees all packets for a flow (and its bidirectional opposite). If your network has some strange encapsulation, you can implement your own decoder. In this example, we handle Ethernet packets which are encapsulated in a 4-byte header. See the docs for Decoder and PacketBuilder for more details on how coding decoders works, or look at RegisterLayerType and RegisterEndpointType to see how to add layer/endpoint types to gopacket. TLDR: DecodingLayerParser takes about 10% of the time as NewPacket to decode packet data, but only for known packet stacks. Basic decoding using gopacket.NewPacket or PacketSource.Packets is somewhat slow due to its need to allocate a new packet and every respective layer. It's very versatile and can handle all known layer types, but sometimes you really only care about a specific set of layers regardless, so that versatility is wasted. DecodingLayerParser avoids memory allocation altogether by decoding packet layers directly into preallocated objects, which you can then reference to get the packet's information. A quick example: The important thing to note here is that the parser is modifying the passed in layers (eth, ip4, ip6, tcp) instead of allocating new ones, thus greatly speeding up the decoding process. It's even branching based on layer type... it'll handle an (eth, ip4, tcp) or (eth, ip6, tcp) stack. However, it won't handle any other type... since no other decoders were passed in, an (eth, ip4, udp) stack will stop decoding after ip4, and only pass back [LayerTypeEthernet, LayerTypeIPv4] through the 'decoded' slice (along with an error saying it can't decode a UDP packet). Unfortunately, not all layers can be used by DecodingLayerParser... only those implementing the DecodingLayer interface are usable. Also, it's possible to create DecodingLayers that are not themselves Layers... see layers.IPv6ExtensionSkipper for an example of this. As well as offering the ability to decode packet data, gopacket will allow you to create packets from scratch, as well. A number of gopacket layers implement the SerializableLayer interface; these layers can be serialized to a []byte in the following manner: SerializeTo PREPENDS the given layer onto the SerializeBuffer, and they treat the current buffer's Bytes() slice as the payload of the serializing layer. Therefore, you can serialize an entire packet by serializing a set of layers in reverse order (Payload, then TCP, then IP, then Ethernet, for example). The SerializeBuffer's SerializeLayers function is a helper that does exactly that. To generate a (empty and useless, because no fields are set) Ethernet(IPv4(TCP(Payload))) packet, for example, you can run: If you use gopacket, you'll almost definitely want to make sure gopacket/layers is imported, since when imported it sets all the LayerType variables and fills in a lot of interesting variables/maps (DecodersByLayerName, etc). Therefore, it's recommended that even if you don't use any layers functions directly, you still import with:

v1.1.17 • 5 years ago

github.com/dchest/passwordreset

Package passwordreset implements creation and verification of secure tokens useful for implementation of "reset forgotten password" feature in web applications. This package generates and verifies signed one-time tokens that can be embedded in a link sent to users when they initiate the password reset procedure. When a user changes their password, or when the expiry time passes, the token becomes invalid. Secure token format: where expiration time is the number of seconds since Unix epoch UTC indicating when this token must expire (4 bytes, big-endian, uint32), login is a byte string of arbitrary length (at least 1 byte, not null-terminated), and signature is 32 bytes of HMAC-SHA256(expiration_time || login, k), where k = HMAC-SHA256(expiration_time || login, userkey), where userkey = HMAC-SHA256(password value, secret key), where password value is any piece of information derived from user's password, which will change once the user changes their password (for example, a hash of the password), and secret key is an application-specific secret key. Password value is used to make tokens one-time, that is, once a user changes their password, the token which they used to do a reset, becomes invalid. Usage example: Your application must have a strong secret key for password reset purposes. This key will be used to generate and verify password reset tokens. (If you already have a secret key, for example, for authcookie package, it's better not to reuse it, just use a different one.) Create a function that will query your users database and return some password-related value for the given login. A password-related value means some value that will change once a user changes their password, for example: a password hash, a random salt used to generate it, or time of password creation. This value, mixed with app-specific secret key, will be used as a key for password reset token, thus it will be kept secret. When a user initiates password reset (by entering their login, and maybe answering a secret question), generate a reset token: Send a link with this token to the user by email, for example: https://www.example.com/reset?token=Talo3mRjaGVzdITUAGOXYZwCMq7EtHfYH4ILcBgKaoWXDHTJOIlBUfcr Once a user clicks this link, read a token from it, then verify this token by passing it to VerifyToken function along with the getPasswordHash function, and an app-specific secret key: If verification succeeded, allow to change password for the returned login.

v0.0.0-20190826080013-4518b1f41006 • 5 years ago

github.com/Shugyousha/mafsa

Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) in a space-optimized way as described by Dacuik, Mihov, Watson, and Watson in their paper, "Incremental Construction of Minimal Acyclic Finite-State Automata" (2000). It also implements Minimal Perfect Hashing (MPH) as described by Lucceshi and Kowaltowski in their paper, "Applications of Finite Automata Representing Large Vocabularies" (1992). Unscientifically speaking, this package lets you store large amounts of strings (with Unicode) in memory so that membership queries, prefix lookups, and fuzzy searches are fast. And because minimal perfect hashing is included, you can associate each entry in the tree with more data used by your application. See the README or the end of this documentation for a brief tutorial. MA-FSA structures are a specific type of Deterministic Acyclic Finite State Automaton (DAFSA) which fold equivalent state transitions into each other starting from the suffix of each entry. Typical construction algorithms involve building out the entire tree first, then minimizing the completed tree. However, the method described in the paper above allows the tree to be minimized after every word insertion, provided the insertions are performed in lexicographical order, which drastically reduces memory usage compared to regular prefix trees ("tries"). The goal of this package is to provide a simple, useful, and correct implementation of MA-FSA. Though more complex algorithms exist for removal of items and unordered insertion, these features may be outside the scope of this package. Membership queries are on the order of O(n), where n is the length of the input string, so basically O(1). It is advisable to keep n small since long entries without much in common, especially in the beginning or end of the string, will quickly overrun the optimizations that are available. In those cases, n-gram implementations might be preferable, though these will use more CPU. This package provides two kinds of MA-FSA implementations. One, the BuildTree, facilitates the construction of an optimized tree and allows ordered insertions. The other, MinTree, is effectively read-only but uses significantly less memory and is ideal for production environments where only reads will be occurring. Usually your build process will be separate from your production application, which will make heavy use of reading the structure. To use this package, create a BuildTree and insert your items in lexicographical order: The tree is now compressed to a minimum number of nodes and is ready to be saved. In your production application, then, you can read the file into a MinTree directly: The mt variable is a *MinTree which has the same data as the original BuildTree, but without all the extra "scaffolding" that was required for adding new elements. The package provides some basic read mechanisms.

v0.2.0 • 4 years ago

github.com/filecoin-project/go-amt-ipld/v3

Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (3^2)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.

v3.1.1 • 3 years ago

github.com/filecoin-project/go-amt-ipld/v4

Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (2^3)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.

v4.2.0 • last year

github.com/lunny/nodb

github.com/Decred-Next/dcrnd/crypto/blake256/v8

github.com/NebulousLabs/fastrand

github.com/rafaeljusto/redigomock/v3

github.com/andskur/argon2-hashing

github.com/lithammer/go-jump-consistent-hash

github.com/retailnext/hllpp

github.com/edwingeng/doublejump

github.com/wealdtech/go-merkletree

github.com/dgryski/go-minhash

github.com/derailed/tview

github.com/kalafut/imohash

github.com/filecoin-project/go-hamt-ipld/v3

github.com/filecoin-project/go-hamt-ipld/v2

github.com/alecthomas/mph

github.com/foomo/htpasswd

github.com/chmduquesne/rollinghash

github.com/tg123/go-htpasswd

rsc.io/2fa

github.com/nullrocks/identicon

github.com/mraksoll4/btcutil

github.com/NebulousLabs/merkletree

github.com/raviqqe/hamt

blainsmith.com/go/seahash

github.com/jzelinskie/whirlpool

github.com/lazyledger/merkletree

git.sr.ht/~tslocum/cview

github.com/samli88/go-x11-hash

github.com/Multi-Tier-Cloud/hash-lookup

github.com/tsg/gopacket

github.com/Peanuttown/gopacket

github.com/ltcsuite/ltcd/ltcutil

github.com/hlandau/passlib

github.com/Redundancy/go-sync

github.com/tendermint/go-crypto

github.com/greatroar/blobloom

github.com/btcsuite/fastsha256

github.com/dchest/passwordreset

github.com/decred/dcrutil

github.com/Shugyousha/mafsa

github.com/gcash/bchutil

github.com/TRON-US/go-libp2p-kad-dht

github.com/dashevo/dashd-go/btcutil

github.com/filecoin-project/go-amt-ipld/v3

github.com/filecoin-project/go-amt-ipld/v4

github.com/dgryski/go-maglev

github.com/gcash/bchwallet/walletdb

github.com/codahale/blake2

github.com/Pyrinpyi/go-muhash

github.com/krishnaindani/applications/consistenthashing/load_distribution