Tree Structure

gitlab.com/tozd/go/errors

Package errors provides errors with a recorded stack trace and optional structured details. The traditional error handling idiom in Go is roughly akin to which when applied recursively up the call stack results in error reports without a stack trace or context. The errors package provides error handling primitives to annotate errors along the failure path in a way that does not destroy the original error. When interacting with code which returns errors without a stack trace, you can upgrade that error to one with a stack trace using errors.WithStack. For example: errors.WithStack records the stack trace at the point where it was called, so use it as close to where the error originated as you can get so that the recorded stack trace is more precise. The example above uses errors.E for the returned error type instead of the standard error type. This is not required, but it tells Go that you expect that the function returns only errors with a stack trace and Go type system then helps you find any cases where this is not so. errors.WithStack does not record the stack trace if it is already present in the error so it is safe to call it if you are unsure if the error contains a stack trace. Errors with a stack trace implement the following interface, returning program counters of function invocations: You can use standard runtime.CallersFrames to obtain stack trace frame information (e.g., function name, source code file and line). You can also use errors.StackFormatter to format the stack trace. Although the stackTracer interface is not exported by this package, it is considered a part of its stable public interface. Sometimes an error occurs in a low-level function and the error messages returned from it are too low-level, too. You can use errors.Wrap to construct a new higher-level error while recording the original error as a cause. In the example above we returned a new error with a new message, hiding the low-level details. The returned error implements the following interface: which enables access to the underlying low-level error. You can also use errors.Cause to obtain the cause. Although the causer interface is not exported by this package, it is considered a part of its stable public interface. Sometimes you do not want to hide the error message but just add to it. You can use errors.WithMessage, which adds a prefix to the existing message, or errors.Errorf, which gives you more control over the new message. Example new messages could then be, respectively: Errors returned by this package implement the detailer interface: which enables access to a map with optional additional details about the error. Returned map can be modified in-place. You can also use errors.Details and errors.AllDetails to access details: You can also use errors.WithDetails as an alternative to errors.WithStack if you also want to add details while recording the stack trace: Errors which implement the following standard unwrapper interfaces: form a tree of errors where a wrapping error points its parent, wrapped, error(s). Errors returned from this package implement this interface to return the original error or errors, when they exist. This enables us to have constant base errors which we annotate with a stack trace before we return them: Or with details: We can use errors.Is to determine which error has been returned: Works across the tree, too: To access details, use: You can join multiple errors into one error by calling errors.Join. Join also records the stack trace at the point it was called. All errors with a stack trace returned from this package implement fmt.Formatter interface and can be formatted by the fmt package. They also support marshaling to JSON. Same formatting and JSON marshaling for errors coming outside of this package can be done by wrapping them into errors.Formatter.

v0.8.1 • last year

github.com/webability-go/xcore/v2

Package xcore is a set of basic objects for programation (XCache for caches, XDataset for data sets, XLanguage for languages and XTemplate for templates). For GO, the actual existing code includes: - XCache: Application Memory Caches for any purpose, with time control and quantity control of object in the cache and also check changes against original source. It is a thread safe cache. - XDataset: Basic nested data structures for any purpose (template injection, configuration files, database records, etc). - XLanguage: language dependent text tables for internationalization of code. The sources can be text or XML file definitions. - XTemplate: template system with meta language to create complex documents (compatible with any text language, HTML, CSS, JS, PDF, XML, etc), heavily used on CMS systems and others. It is already used on sites that serve more than 60 million pages a month (500 pages per second on pike hour) and can be used on multithreading environment safely. XCache is a library to cache all the data you want into current application memory for a very fast access to the data. The access to the data support multithreading and concurrency. For the same reason, this type of cache is not persistent (if you exit the application) and cannot grow too much (as memory is the limit). However, you can control a timeout of each cache piece, and eventually a comparison function against a source (file, database, etc) to invalid the cache. 1. Declare a new XCache with NewXCache() function: 2. Fill in the cache: Once you have declared the cache, you can fill it with anything you want. The main cache object is an interface{} so you can put here anything you need, from simple variables to complex structures. You need to use the Set function: Note the ID is always a string, so convert a database key to string if needed. 3. To use the cache, just ask for your entry with Get function: 4. To maintain the cache: You may need Del function, to delete a specific entry (maybe because you deleted the record in database). You may also need Clean function to deletes a percentage of the cache, or Flush to deletes it all. The Verify function is used to check cache entries against their sources through the Validator function. Be very careful, if the cache is big or the Validator function is complex (maybe ask for a remote server information), the verification may be VERY slow and huge CPU use. The Count function gives some stats about the cache. 5. How to use Verify Function: This function is recommended when the source is local and fast to check (for instance a language file or a template file). When the source is distant (other cluster database, any rpc source on another network, integration of many parts, etc), it is more recommended to create a function that will delete the cache when needed (on demand cache change). The validator function is a func(id, time.Time) bool function. The first parameter is the ID entry in the cache, the second parameter the time of the entry was created. The validator function returns true is the cache is still valid, or false if it needs to be invalidated. The XCache is thread safe. The cache can be limited in quantity of entries and timeout for data. The cache is automanaged (for invalid expired data) and can be cleaned partially or totally manually. The XLanguage table of text entries can be loaded from XML file, XML string or normal text file or string. It is used to keep a table of id=value set of entries in any languages you need, so it is easy to switch between XLanguage instance based on the required language needed. Obviously, any XLanguage you load in any language should have the same id entries translated, for the same use. The XLanguage object is thread safe 1. loading: You can load any file or XML string directly into the object. 1.1 The XML Format is: NAMEOFTABLE is the name of your table entry, for example "loginform", "user_report", etc. LG is the ISO-3369 2 letters language ID, for example "es" for spanish, "en" for english, "fr" for french, etc. ENTRYNAME is the ID of the entry, for example "greating", "yourname", "submitbutton". ENTRYVALUE is the text for your entry, for example "Hello", "You are:", "Save" if your table is in english. STATUSVALUE is the status of the entry- You may put any value to control your translation over time and processes. 1.2 The flat text format is: ENTRYNAME is the ID of the entry, for example "greating", "yourname", "submitbutton". ENTRYVALUE is the text for your entry, for example "Hello", "You are:", "Save" if your table is in english. There is no name of table or language in this format (you "know" what you are loading). The advantage to use XML format is to have more control over your language, and eventyally add attributes into your entries, for instance you may add attributes translated="yes/no", verified="yes/no", and any other data that your system could insert. The XLanguage will ignore those attributes loading the table. 2. creation: To create a new XLanguage empty structure: There are 4 functions to create the language from a file or string, flat text or XML text: Then you can use the set of basic access functions: SetName/SetLanguage functions are used to set the table name and language of the object (generally to build an object from scratch). GetName/GetLanguage functions are used to get the table name and language of the object (generally when you load it from some source). Set/Get/Del functions are used to add or modify a new entry, read an entry, or deletes an entry in the object. SetStatus/GetStatus functions are used to add or get a status for the entry in the object. To create am XML file from the objet, you can use the GetXML() function 1. Overview: The XDataSet is a set of interfaces and basic classes ready-to-use to build a standard set of data optionally nested and hierarchical, that can be used for any purpose: - Keep complex data in memory. - Create JSON structures. - Inject data into templates. - Interchange database data (records set and record). You can store into it generic supported data, as well as any complex interface structures: - Int - Float - String - Time - Bool - []Int - []Float - []Time - []Bool - XDataSetDef (anything extended with this interface) - []String - Anything else ( interface{} ) - XDataSetCollectionDef (anything extended with this interface) The generic supported data comes with a set of functions to get/set those data directly into the XDataset. Example: Note that all references to XDataset and XDatasetCollection are pointers, always (to be able to modify the values of them). 2. XDatasetDef interface: It is the interface to describe a simple set of data mapped as "name": value, where value can be of any type. The interface implements a good amount of basic methods to get the value on various format such as GetString("name"), GetInt("name"), etc (see below). If the value is another type as asked, the method should contert it if possible. For instance "key":123 required through GetString("key") should return "123". The XDataset type is a simple map[string]interface{} with all the implemented methods and should be enough to use for almost all required cases. However, you can build any complex structure that extends the interface and implements all the required functions to stay compatible with the XDatasetDef. 3. XDatasetCollectionDef Interface: This is the interface used to extend any type of data as a Collection, i-e an array of XDatasetDef. This is a slice of any XDatasetDef compatible data. The interface implements some methods to work on array structure such as Push, Pop, Shift, Unshift and some methods to search data into the array. The XDatasetCollection type is a simple []DatasetDef with all the implemented methods and should be enough to use for almost all required cases. 1. Overview: The XDataSetTS is a DatasetDef structure, thread safe. It is build on the XDataset with the same properties, but is thread safe to protect Read/Write accesses from different thread. Example: You may also build a XDatasetTS to encapsulate a XDatasetDef that is not thread safe, to use it safely Note that all references to XDatasetTS are pointers, always (to be able to modify the values of them). The DatasetTS meet the XDatasetDef interface 1. Overview: This is a class to compile and keep a Template that can be injected with an XDataSet structure of data, with a metalanguage to inject the data. The metalanguage is extremely simple and is made to be useful and **really** separate programation from template code (not like other many generic template systems that just mix code and data). A template is a set of HTML/XML (or any other language) string with a meta language to inject variables and build a final string. The XCore XTemplate system is based on the injection of parameters, language translation strings and data fields directly into the HTML (Or any other language you need) template. The HTML itself (or any other language) is a text code not directly used by the template system, but used to dress the data you want to represent in your preferred language. The variables to inject must be into a XDataSet structure or into a structure extended from XDataSetDef interface. The injection of data is based on a XDataSet structure of values that can be nested into another XDataSet and XDataSetConnection and so on. The template compiler recognize nested arrays to automatically make loops on the information. Templates are made to store reusable HTML code, and overall easily changeable by people that do not know how to write programs. A template can be as simple as a single character (no variables to inject) to a very complex nested, conditional and loops sub-templates. Yes. this is a template, but a very simple one without need to inject any data. Let's go more complex: Having an array of data, we want to paint it beautifull: We can create a template to inject this data into it: 2. Create and use XTemplateData: In sight to create and use templates, you have all those possible options to use: Creates the XTemplate from a string or a file or any other source: Clone the XTemplate: 3. Metalanguage Reference: 3.1 Comments: %-- and --% You may use comments into your template. The comments will be discarded immediately at the compilation of the template and do not interfere with the rest of your code. Example: 3.2 Nested Templates: [[...]] and [[]] You can define new nested templates into your main template A nested template is defined by: The templteid is any combination of lowers letters only (a-z), numbers (0-9), and 3 special chars: . (point) - (dash) and _ (underline). The template is closed with [[]]. There is no limits into nesting templates. Any nested template will inheritate all the father elements and can use father elements too. To call a sub-template, you need to use &&templateid&& syntax (described below in this document). Example: You may use more than one id into the same template to avoid repetition of the same code. The different id's are separated with a pipe | Important note: A template will be visible only on the same level of its declaration. For example, if you put a subtemplate "b" into a subtemplate "a", it will not be visible by &&b&& from the top level, but only into the subtemplate "a". 3.3 Simple Elements: ##...## and {{...}} There are 2 types of simple elements. Language elements and Data injector elements (also called field elements). We "logically" define the 2 type of elements. The separation is only for human logic and template filling, however the language information can perfectly fit into the data to inject (and not use ## entries). 3.3.1 Languages elements: ##entry## All the languages elements should have the format: ##entry##. A language entry is generally anything written into your code or page that does not come from a database, and should adapt to the language of the client visiting your site. Using the languages elements may depend on the internationalization of your page. If your page is going to be in a single language forever, you really dont need to use languages entries. The language elements generally carry titles, menu options, tables headers etc. The language entries are set into the "#" entry of the main template XDataset to inject, and is a XLanguage table. Example: With data to inject: 3.3.2 Field elements: {{fieldname}} Fields values should have the format: {{fieldname}}. Your fields source can be a database or any other preferred repository data source. Example: You can access an element with its path into the data set to inject separating each field level with a > (greater than). This will take the name of the second hobby in the dataset defined upper. (collections are 0 indexed). The 1 denotes the second record of the hobbies XDatasetCollection. If the field is not found, it will be replaced with an empty string. Tecnically your field names can be any string in the dataset. However do not use { } or > into the names of your fields or the XTemplate may not use them correctly. We recommend to use lowercase names with numbers and ._- Accents and UTF8 symbols are also welcome. 3.3.3 Scope: When you use an id to point a value, the template will first search into the available ids of the local level. If no id is found, the it will search into the upper levers if any, and so on. Example: At the level of 'data2', using {{appname}} will get back 'DomCore'. At the level of 'key1', using {{appname}} will get back 'Nested App'. At the level of 'key2', using {{appname}} will get back 'DomCore'. At the level of root, 'data1' or 'detail', using {{appname}} will get back an empty string. 3.3.4 Path access: id>id>id>id At any level into the data array, you can access any entry into the subset array. For instance, taking the previous array of data to inject, let's suppose we are into a nested meta elements at the 'data1' level. You may want to access directly the 'Juan' entry. The path will be: The José's status value from the root will be: 3.4 Meta Elements They consist into an injection of a XDataset, called the "data to inject", into the template. The meta language is directly applied on the structure of the data array. The data to inject is a nested set of variables and values with the structure you want (there is no specific construction rules). You can inject nearly anything into a template meta elements. Example of a data array to inject: You can access directly any data into the array with its relative path (relative to the level you are when the metaelements are applied, see below). There are 4 structured meta elements in the XTemplate templates to use the data to inject: Reference, Loops, Condition and Debug. The structure of the meta elements in the template must follow the structure of the data to inject. 3.4.1 References to another template: &&order&& 3.4.1.1 When order is a single id (characters a-z0-9.-_), it will make a call to a sub template with the same set of data and replace the &&...&& with the result. The level in the data set is not changed. Example based on previous array of Fred's data: 3.4.1.2 When order contains 2 parameters separated by a semicolumn :, then second parameter is used to change the level of the data of array, with the subset with this id. The level in the data set is changed to this sub set. Example based on previous array of Fred's data: 3.4.1.3 When order contains 3 parameters separated by a semicolumn :, the second and third parameters are used to search the name of the new template based on the data fields to inject. This is an indirect access to the template. The name of the subtemplate is build with parameter3 as prefix and the content of parameter2 value. The third parameter must be empty. 3.4.2 Loops: @@order@@ 3.4.2.1 Overview This meta element will loop over each itterance of the set of data and concatenate each created template in the same order. You need to declare a sub template for this element. You may aso declare derivated sub templates for the different possible cases of the loop: For instance, If your main subtemplate for your look is called "hobby", you may need a different template for the first element, last element, Nth element, Element with a value "no" in the sport field, etc. The supported postfixes are: When the array to iterate is empty: - .none (for example "There is no hobby") When the array contains elements, it will search in order, the following template and use the first found: - templateid.key.[value] value is the key of the vector line. If the collection has a named key (string) or is a direct array (0, 1, 2...) - templateid.first if it is the first element of the array set (new from v1.01.11) - templateid.last if it is the first element of the array set (new from v1.01.11) - templateid.even if the line number is even - templateid in all other cases (odd is contained here if even is defined) Since v2.1.7, you can also use the pseudo field {{.counter}} into the loop subtemplate, to get the number of the counter of the loop, it is 1-based (first loop is 1, not 0) 3.4.2.2 When order is a single id (characters a-z0-9.-_), it will make a call to the sub template id with the same subset of data with the same id and replace the @@...@@ for each itterance of the data with the result. Example based on previous array of Fred's data: 3.4.2.3 When order contains 2 parameters separated by a semicolumn :, then first parameter is used to change the level of the data of array, with the subset with this id, and the second one for the template to use. Example based on previous array of Fred's data: 3.4.3 Conditional: ??order?? Makes a call to a subtemplate only if the field exists and have a value. This is very userfull to call a sub template for instance when an image or a video is set. When the condition is not met, it will search for the [id].none template. The conditional element does not change the level in the data set. 3.4.3.1 When order is a single id (characters a-z0-9.-_), it will make a call to the sub template id with the same field in the data and replace the ??...?? with the corresponding template Example based on previous array of Fred's data: 3.4.3.2 When order contains 2 parameters separated by a semicolumn :, then second parameter is used to change the level of the data of array, with the subset with this id. Example based on previous array of Fred's data: If the asked field is a catalog, true/false, numbered, you may also use .[value] subtemplates 3.5 Debug Tools: !!order!! There are two keywords to dump the content of the data set. This is very useful when you dont know the code that calls the template, don't remember some values, or for debug facilities. 3.5.1 !!dump!! Will show the totality of the data set, with ids and values. 3.5.1 !!list!! Will show only the tree of parameters, values are not shown.

v2.2.2 • last year

github.com/chenjianmei111/go-amt-ipld/v3

Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (3^2)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.

v3.0.1 • 4 years ago

github.com/chenjianmei111/go-hamt-ipld/v3

Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin. HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID). The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive. Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order. This HAMT implementation: • Fixes the `bucketSize` to 3. • Defaults the `bitWidth` to 8, however within Filecoin it uses 5 • Defaults the hash algorithm to the 64-bit variant of Murmur3-x64 The algorithm used here is identical to that of the IPLD HashMap algorithm specified at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The specific parameters used by Filecoin and the DAG-CBOR block layout differ from the specification and are defined at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.

v3.0.2 • 4 years ago

github.com/openacid/slimarray

Package slimarray uses polynomial to compress and store an array of uint32. A uint32 costs only 5 bits in a sorted array of a million number in range [0, 1000*1000]. We use a polynomial y = a + bx + cx² to describe the overall trend of the numbers. And for every number i we add a residual to fit the gap between y(i) and nums[i]. E.g. If there are 4 numbers: 0, 15, 33, 50 The polynomial and residuals are: In this case the residuals require 3 bits for each of them. To retrieve the numbers, we evaluate y(i) and add the residual to it: Another space efficient data structure to store uint32 array is trie or prefix tree or radix tree. It is possible to use bitmap-based btree like structure to reduce space(very likely in such case it provides higher compression rate). But it requires the array to be sorted. SlimArray does not have such restriction. It is more adaptive with data layout. To achieve high compression rate, it only requires the data has a overall trend, e.g., roughly sorted, as seen in the above 4 integers examples. Additionally, it also accept duplicated element in the array, which a bitmap based or tree-like data structure does not allow. SlimArray splits the entire array into segments(Seg), each of which has 1024 numbers. And then it splits every segment into several spans. Every span has its own polynomial. A span has 16*k numbers. A segment has at most 64 spans. A SlimArray is a compacted data structure. The original data structures are defined as follow(assumes original user data is `nums []uint32`): A span stores 16*k int32 in it, where k ∈ [1, 64). `Seg.SpansBitmap` describes the layout of Span-s in a Seg. The i-th "1" indicates where the last 16 numbers are in the i-th Span. e.g.: In the above example: `Seg.Rank` caches the total count of "1" in all preceding Seg.SpansBitmap. This accelerate locating a Span in the packed field SlimArray.Polynomials . `Span.width` is the count of numbers stored in this span. It does not need to be stored because it can be calculated by counting the "0" between two "1" in `Seg.SpansBitmap`. `Span.Polynomial` stores 3 coefficients of the polynomial describing the overall trend of this span. I.e. the `[a, b, c]` in `y = a + bx + cx²` `Span.Config.Offset` adjust the offset to locate a residual. In a span we want to have that: But if the preceding span has smaller residual width, the "offset" could be negative, e.g.: span[0] has residual of width 0 and 16 residuals, span[1] has residual of width 4. Then the "offset" of span[1] is `-16*4` in order to satisfy: `(-16*4) + i * 4` is the correct residual position, for i in [16, 32). `Span.Config.ResidualWidth` specifies the number of bits to store every residual in this span, it must be a power of 2: `2^k`. `Span.Residuals` is an array of residuals of length `Span.width`. Every elt in it is a `ResidualWidth`-bits integers. SlimArray compact `Seg` into a dense format: `SlimArray.Residuals` simply packs the residuals of every nums[i] together.

v0.1.3 • 4 years ago

github.com/webability-go/xcore

**Please use xcore/v2** **The version 1 is obsolete.** Package xcore is a set of basic objects for programation (XCache for caches, XDataset for data sets, XLanguage for languages and XTemplate for templates). For GO, the actual existing code includes: - XCache: Application Memory Caches for any purpose, with time control and quantity control of object in the cache and also check changes against original source. It is a thread safe cache. - XDataset: Basic nested data structures for any purpose (template injection, configuration files, database records, etc). - XLanguage: language dependent text tables for internationalization of code. The sources can be text or XML file definitions. - XTemplate: template system with meta language to create complex documents (compatible with any text language, HTML, CSS, JS, PDF, XML, etc), heavily used on CMS systems and others. It is already used on sites that serve more than 60 million pages a month (500 pages per second on pike hour) and can be used on multithreading environment safely. XCache is a library to cache all the data you want into current application memory for a very fast access to the data. The access to the data support multithreading and concurrency. For the same reason, this type of cache is not persistent (if you exit the application) and cannot grow too much (as memory is the limit). However, you can control a timeout of each cache piece, and eventually a comparison function against a source (file, database, etc) to invalid the cache. 1. Declare a new XCache with NewXCache() function: 2. Fill in the cache: Once you have declared the cache, you can fill it with anything you want. The main cache object is an interface{} so you can put here anything you need, from simple variables to complex structures. You need to use the Set function: Note the ID is always a string, so convert a database key to string if needed. 3. To use the cache, just ask for your entry with Get function: 4. To maintain the cache: You may need Del function, to delete a specific entry (maybe because you deleted the record in database). You may also need Clean function to deletes a percentage of the cache, or Flush to deletes it all. The Verify function is used to check cache entries against their sources through the Validator function. Be very careful, if the cache is big or the Validator function is complex (maybe ask for a remote server information), the verification may be VERY slow and huge CPU use. The Count function gives some stats about the cache. 5. How to use Verify Function: This function is recommended when the source is local and fast to check (for instance a language file or a template file). When the source is distant (other cluster database, any rpc source on another network, integration of many parts, etc), it is more recommended to create a function that will delete the cache when needed (on demand cache change). The validator function is a func(id, time.Time) bool function. The first parameter is the ID entry in the cache, the second parameter the time of the entry was created. The validator function returns true is the cache is still valid, or false if it needs to be invalidated. The XCache is thread safe. The cache can be limited in quantity of entries and timeout for data. The cache is automanaged (for invalid expired data) and can be cleaned partially or totally manually. The XLanguage table of text entries can be loaded from XML file, XML string or normal text file or string. It is used to keep a table of id=value set of entries in any languages you need, so it is easy to switch between XLanguage instance based on the required language needed. Obviously, any XLanguage you load in any language should have the same id entries translated, for the same use. 1. loading: You can load any file or XML string directly into the object. 1.1 The XML Format is: NAMEOFTABLE is the name of your table entry, for example "loginform", "user_report", etc. LG is the ISO-3369 2 letters language ID, for example "es" for spanish, "en" for english, "fr" for french, etc. ENTRYNAME is the ID of the entry, for example "greating", "yourname", "submitbutton". ENTRYVALUE is the text for your entry, for example "Hello", "You are:", "Save" if your table is in english. 1.2 The flat text format is: ENTRYNAME is the ID of the entry, for example "greating", "yourname", "submitbutton". ENTRYVALUE is the text for your entry, for example "Hello", "You are:", "Save" if your table is in english. There is no name of table or language in this format (you "know" what you are loading). The advantage to use XML format is to have more control over your language, and eventyally add attributes into your entries, for instance you may add attributes translated="yes/no", verified="yes/no", and any other data that your system could insert. The XLanguage will ignore those attributes loading the table. 2. creation: To create a new XLanguage empty structure: There are 4 functions to create the language from a file or string, flat text or XML text: Then you can use the set of basic access functions: SetName/SetLanguage functions are used to set the table name and language of the object (generally to build an object from scratch). GetName/GetLanguage functions are used to get the table name and language of the object (generally when you load it from some source). Set/Get/Del functions are used to add or modify a new entry, read an entry, or deletes an entry in the object. 1. Overview: The XDataSet is a set of interfaces and basic classes ready-to-use to build a standard set of data optionally nested and hierarchical, that can be used for any purpose: - Keep complex data in memory. - Create JSON structures. - Inject data into templates. - Interchange database data (records set and record). You can store into it generic supported data, as well as any complex interface structures: - Int - Float - String - Time - Bool - []Int - []Float - []Time - []Bool - XDataSetDef (anything extended with this interface) - []String - Anything else ( interface{} ) - XDataSetCollectionDef (anything extended with this interface) The generic supported data comes with a set of functions to get/set those data directly into the XDataset. Example: Note that all references to XDataset and XDatasetCollection are pointers, always (to be able to modify the values of them). 2. XDatasetDef interface: It is the interface to describe a simple set of data mapped as "name": value, where value can be of any type. The interface implements a good amount of basic methods to get the value on various format such as GetString("name"), GetInt("name"), etc (see below). If the value is another type as asked, the method should contert it if possible. For instance "key":123 required through GetString("key") should return "123". The XDataset type is a simple map[string]interface{} with all the implemented methods and should be enough to use for almost all required cases. However, you can build any complex structure that extends the interface and implements all the required functions to stay compatible with the XDatasetDef. 3. XDatasetCollectionDef Interface: This is the interface used to extend any type of data as a Collection, i-e an array of XDatasetDef. This is a slice of any XDatasetDef compatible data. The interface implements some methods to work on array structure such as Push, Pop, Shift, Unshift and some methods to search data into the array. The XDatasetCollection type is a simple []DatasetDef with all the implemented methods and should be enough to use for almost all required cases. 1. Overview: This is a class to compile and keep a Template that can be injected with an XDataSet structure of data, with a metalanguage to inject the data. The metalanguage is extremely simple and is made to be useful and **really** separate programation from template code (not like other many generic template systems that just mix code and data). A template is a set of HTML/XML (or any other language) string with a meta language to inject variables and build a final string. The XCore XTemplate system is based on the injection of parameters, language translation strings and data fields directly into the HTML (Or any other language you need) template. The HTML itself (or any other language) is a text code not directly used by the template system, but used to dress the data you want to represent in your preferred language. The variables to inject must be into a XDataSet structure or into a structure extended from XDataSetDef interface. The injection of data is based on a XDataSet structure of values that can be nested into another XDataSet and XDataSetConnection and so on. The template compiler recognize nested arrays to automatically make loops on the information. Templates are made to store reusable HTML code, and overall easily changeable by people that do not know how to write programs. A template can be as simple as a single character (no variables to inject) to a very complex nested, conditional and loops sub-templates. Yes. this is a template, but a very simple one without need to inject any data. Let's go more complex: Having an array of data, we want to paint it beautifull: We can create a template to inject this data into it: 2. Create and use XTemplateData: In sight to create and use templates, you have all those possible options to use: Creates the XTemplate from a string or a file or any other source: 3. Metalanguage Reference: 3.1 Comments: %-- and --% You may use comments into your template. The comments will be discarded immediately at the compilation of the template and do not interfere with the rest of your code. Example: 3.2 Nested Templates: [[...]] and [[]] You can define new nested templates into your main template A nested template is defined by: The templteid is any combination of lowers letters only (a-z), numbers (0-9), and 3 special chars: . (point) - (dash) and _ (underline). The template is closed with [[]]. There is no limits into nesting templates. Any nested template will inheritate all the father elements and can use father elements too. To call a sub-template, you need to use &&templateid&& syntax (described below in this document). Example: You may use more than one id into the same template to avoid repetition of the same code. The different id's are separated with a pipe | Important note: A template will be visible only on the same level of its declaration. For example, if you put a subtemplate "b" into a subtemplate "a", it will not be visible by &&b&& from the top level, but only into the subtemplate "a". 3.3 Simple Elements: ##...## and {{...}} There are 2 types of simple elements. Language elements and Data injector elements (also called field elements). We "logically" define the 2 type of elements. The separation is only for human logic and template filling, however the language information can perfectly fit into the data to inject (and not use ## entries). 3.3.1 Languages elements: ##entry## All the languages elements should have the format: ##entry##. A language entry is generally anything written into your code or page that does not come from a database, and should adapt to the language of the client visiting your site. Using the languages elements may depend on the internationalization of your page. If your page is going to be in a single language forever, you really dont need to use languages entries. The language elements generally carry titles, menu options, tables headers etc. The language entries are set into the "#" entry of the main template XDataset to inject, and is a XLanguage table. Example: With data to inject: 3.3.2 Field elements: {{fieldname}} Fields values should have the format: {{fieldname}}. Your fields source can be a database or any other preferred repository data source. Example: You can access an element with its path into the data set to inject separating each field level with a > (greater than). This will take the name of the second hobby in the dataset defined upper. (collections are 0 indexed). The 1 denotes the second record of the hobbies XDatasetCollection. If the field is not found, it will be replaced with an empty string. Tecnically your field names can be any string in the dataset. However do not use { } or > into the names of your fields or the XTemplate may not use them correctly. We recommend to use lowercase names with numbers and ._- Accents and UTF8 symbols are also welcome. 3.3.3 Scope: When you use an id to point a value, the template will first search into the available ids of the local level. If no id is found, the it will search into the upper levers if any, and so on. Example: At the level of 'data2', using {{appname}} will get back 'DomCore'. At the level of 'key1', using {{appname}} will get back 'Nested App'. At the level of 'key2', using {{appname}} will get back 'DomCore'. At the level of root, 'data1' or 'detail', using {{appname}} will get back an empty string. 3.3.4 Path access: id>id>id>id At any level into the data array, you can access any entry into the subset array. For instance, taking the previous array of data to inject, let's suppose we are into a nested meta elements at the 'data1' level. You may want to access directly the 'Juan' entry. The path will be: The José's status value from the root will be: 3.4 Meta Elements They consist into an injection of a XDataset, called the "data to inject", into the template. The meta language is directly applied on the structure of the data array. The data to inject is a nested set of variables and values with the structure you want (there is no specific construction rules). You can inject nearly anything into a template meta elements. Example of a data array to inject: You can access directly any data into the array with its relative path (relative to the level you are when the metaelements are applied, see below). There are 4 structured meta elements in the XTemplate templates to use the data to inject: Reference, Loops, Condition and Debug. The structure of the meta elements in the template must follow the structure of the data to inject. 3.4.1 References to another template: &&order&& 3.4.1.1 When order is a single id (characters a-z0-9.-_), it will make a call to a sub template with the same set of data and replace the &&...&& with the result. The level in the data set is not changed. Example based on previous array of Fred's data: 3.4.1.2 When order contains 2 parameters separated by a semicolumn :, then second parameter is used to change the level of the data of array, with the subset with this id. The level in the data set is changed to this sub set. Example based on previous array of Fred's data: 3.4.1.3 When order contains 3 parameters separated by a semicolumn :, the second and third parameters are used to search the name of the new template based on the data fields to inject. This is an indirect access to the template. The name of the subtemplate is build with parameter3 as prefix and the content of parameter2 value. The third parameter must be empty. 3.4.2 Loops: @@order@@ 3.4.2.1 Overview This meta element will loop over each itterance of the set of data and concatenate each created template in the same order. You need to declare a sub template for this element. You may aso declare derivated sub templates for the different possible cases of the loop: For instance, If your main subtemplate for your look is called "hobby", you may need a different template for the first element, last element, Nth element, Element with a value "no" in the sport field, etc. The supported postfixes are: When the array to iterate is empty: - .none (for example "There is no hobby") When the array contains elements, it will search in order, the following template and use the first found: - templateid.key.[value] value is the key of the vector line. If the collection has a named key (string) or is a direct array (0, 1, 2...) - templateid.first if it is the first element of the array set (new from v1.01.11) - templateid.last if it is the first element of the array set (new from v1.01.11) - templateid.even if the line number is even - templateid in all other cases (odd is contained here if even is defined) 3.4.2.2 When order is a single id (characters a-z0-9.-_), it will make a call to the sub template id with the same subset of data with the same id and replace the @@...@@ for each itterance of the data with the result. Example based on previous array of Fred's data: 3.4.2.3 When order contains 2 parameters separated by a semicolumn :, then first parameter is used to change the level of the data of array, with the subset with this id, and the second one for the template to use. Example based on previous array of Fred's data: 3.4.3 Conditional: ??order?? Makes a call to a subtemplate only if the field exists and have a value. This is very userfull to call a sub template for instance when an image or a video is set. When the condition is not met, it will search for the [id].none template. The conditional element does not change the level in the data set. 3.4.3.1 When order is a single id (characters a-z0-9.-_), it will make a call to the sub template id with the same field in the data and replace the ??...?? with the corresponding template Example based on previous array of Fred's data: 3.4.3.2 When order contains 2 parameters separated by a semicolumn :, then second parameter is used to change the level of the data of array, with the subset with this id. Example based on previous array of Fred's data: If the asked field is a catalog, true/false, numbered, you may also use .[value] subtemplates 3.5 Debug Tools: !!order!! There are two keywords to dump the content of the data set. This is very useful when you dont know the code that calls the template, don't remember some values, or for debug facilities. 3.5.1 !!dump!! Will show the totality of the data set, with ids and values. 3.5.1 !!list!! Will show only the tree of parameters, values are not shown.

v1.1.1 • last year

github.com/aws/go-kafka-event-source/streams

GKES (Go Kafka Event Source) attempts to fill the gaps ub the Go/Kafka library ecosystem. It supplies Exactly Once Semantics (EOS), local state stores and incremental consumer rebalancing to Go Kafka consumers, making it a viable alternative to a traditional Kafka Streams application written in Java. GKES is Go/Kafka library tailored towards the development of Event Sourcing applications, by providing a high-throughput, low-latency Kafka client framework. Using Kafka transactions, it provides for EOS, data integrity and high availability. If you wish to use GKES as straight Kafka consumer, it will fit the bill as well. Though there are plenty of libraries for that, and researching which best fits your use case is time well spent. GKES is not an all-in-one, do-everything black box. Some elements, in particular the StateStore, have been left without comprehensive implementations. A useful and performant local state store rarely has a flat data structure. If your state store does, there are some convenient implementations provided. However, to achieve optimum performance, you will not only need to write a StateStore implementation, but will also need to understand what the proper data structures are for your use case (trees, heaps, maps, disk-based LSM trees or combinations thereof). You can use the provided github.com/aws/go-kafka-event-source/streams/stores.SimpleStore as a starting point. GKES purposefully does not provide a pre-canned way for exposing StateStore data, other than a producing to another Kafka topic. There are as many ways to vend data as there are web applications. Rather than putting effort into inventing yet another one, GKES provides the mechanisms to query StateStores via Interjections. This mechanism can be plugged into whatever request/response mechanism that suits your use-case (gRPC, RESTful HTTP service...any number of web frameworks already in the Go ecosystem). [TODO: provide a simple http example] For this familiar with thw Kafka Streams API, GKES provides for stream `Punctuators“, but we call them `Interjections` (because it sounds cool). Interjections allow you to insert actions into your EventSource at specicifed interval per partition assigned via streams.EventSource.ScheduleInterjection, or at any time via streams.EventSource.Interject. This is useful for bookeeping activities, aggregated metric production or even error handling. Interjections have full access to the StateStore associated with an EventSource and can interact with output topics like any other EventProcessor. One issue that Kafka conumer applications have long suffered from are latency spikes during a consumer rebalance. The cooperative sticky rebalancing introduced by Kafka and implemented by kgo helps resolve this issue. However, once StateStore are thrown into the mix, things get a bit more complicated because initializing the StateStore on a host invloves consuming a compacted TopicPartion from start to end. GKES solves this with the IncrementalRebalancer and takes it one step further. The IncrementalRebalancer rebalances consumer partitions in a controlled fashion, minimizing latency spikes and limiting the blast of a bad deployment. GKES provides conventions for asynchronously processing events on the same Kafka partition while still maintaining data/stream integrity. The AsyncBatcher and AsyncJobScheduler allow you to split a TopicPartition into sub-streams by key, ensuring all events for a partitcular key are processed in order, allowing for parallel processing on a given TopicPartition. For more details, see Async Processing Examples A Kafka transaction is a powerful tool which allows for Exactly Once Semantics (EOS) by linking a consumer offset commit to one or more records that are being produced by your application (a StateStore record for example). The history of Kafka EOS is a long and complicated one with varied degrees of performance and efficiency. Early iterations required one producer transaction per consumer partition, which was very ineffiecient as Topic with 1000 partitions would also require 1000 clients in order to provide EOS. This has since been addressed, but depending on client implementations, there is a high risk of running into "producer fenced" errors as well as reduced throughput. In a traditional Java Kafka Streams application, transactions are committed according to the auto-commit frequency, which defaults to 100ms. This means that your application will only produce readable records every 100ms per partition. The effect of this is that no matter what you do, your tail latency will be at least 100ms and downstream consumers will receive records in bursts rather than a steady stream. For many use cases, this is unaceptable. GKES solves this issue by using a configurable transactional producer pool and a type of "Nagle's algorithm". Uncommitted offsets are added to the transaction pool in sequence. Once a producer has reach its record limit, or enough time has elapsed (10ms by default), the head transaction will wait for any incomplete events to finsh, then flush and commit. While this transaction is committing, GKES continues to process events and optimistically begins a new transaction and produces records on the next producer in pool. Since trasnaction produce in sequence, there is no danger of commit offset overlap or duplicate message processing in the case of a failure. To ensure EOS, your EventSource must use either the IncrementalRebalancer, or kgos cooperative sticky implementation. Though if you're using a StateStore, IncrementalRebalancer should be used to avoid lengthy periods of inactivity during application deployments. Rather than create yet another Kafka driver, GKES is built on top of kgo. This Kafka client was chosen as it (in our testing) has superior throughput and latency profiles compared to other client libraries currently available to Go developers. One other key adavantage is that it provides a migration path to cooperative consumer rebalancing, required for our EOS implementation. Other Go Kafka libraries provide cooperative rebalancing, but do not allow you to migrate froma non-cooperative rebalancing strategy (range, sticky etc.). This is a major roadblock for existing deployemtns as the only migration paths are an entirely new consumer group, or to bring your application completely down and re-deploy with a new rebalance strategy. These migration plans, to put it mildly, are big challenge for zero-downtime/live applications. The kgo package now makes this migration possible with zero downtime. Kgo also has the proper hooks need to implement the IncrementalGroupRebalancer, which is necessary for safe deployments when using a local state store. Kudos to kgo!

v1.0.6 • 2 years ago

github.com/IlyaLab/CloudForest

Package CloudForest implements ensembles of decision trees for machine learning in pure Go (golang to search engines). It allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical/categorical data with missing values. These include: Breiman and Cutler's Random Forest for Classification and Regression Adaptive Boosting (AdaBoost) Classification Gradiant Boosting Tree Regression Entropy and Cost driven classification L1 regression Feature selection with artificial contrasts Proximity and model structure analysis Roughly balanced bagging for unbalanced classification The API hasn't stabilized yet and may change rapidly. Tests and benchmarks have been performed only on embargoed data sets and can not yet be released. Library Documentation is in code and can be viewed with godoc or live at: http://godoc.org/github.com/IlyaLab/CloudForest Documentation of command line utilities and file formats can be found in README.md, which can be viewed fromated on github: http://github.com/IlyaLab/CloudForest Pull requests and bug reports are welcome. CloudForest was created by Ryan Bressler and is being developed in the Shumelivich Lab at the Institute for Systems Biology for use on genomic/biomedical data with partial support from The Cancer Genome Atlas and the Inova Translational Medicine Institute. CloudForest is intended to provide fast, comprehensible building blocks that can be used to implement ensembles of decision trees. CloudForest is written in Go to allow a data scientist to develop and scale new models and analysis quickly instead of having to modify complex legacy code. Data structures and file formats are chosen with use in multi threaded and cluster environments in mind. Go's support for function types is used to provide a interface to run code as data is percolated through a tree. This method is flexible enough that it can extend the tree being analyzed. Growing a decision tree using Breiman and Cutler's method can be done in an anonymous function/closure passed to a tree's root node's Recurse method: This allows a researcher to include whatever additional analysis they need (importance scores, proximity etc) in tree growth. The same Recurse method can also be used to analyze existing forests to tabulate scores or extract structure. Utilities like leafcount and errorrate use this method to tabulate data about the tree in collection objects. Decision tree's are grown with the goal of reducing "Impurity" which is usually defined as Gini Impurity for categorical targets or mean squared error for numerical targets. CloudForest grows trees against the Target interface which allows for alternative definitions of impurity. CloudForest includes several alternative targets: Additional targets can be stacked on top of these target to add boosting functionality: Repeatedly splitting the data and searching for the best split at each node of a decision tree are the most computationally intensive parts of decision tree learning and CloudForest includes optimized code to perform these tasks. Go's slices are used extensively in CloudForest to make it simple to interact with optimized code. Many previous implementations of Random Forest have avoided reallocation by reordering data in place and keeping track of start and end indexes. In go, slices pointing at the same underlying arrays make this sort of optimization transparent. For example a function like: can return left and right slices that point to the same underlying array as the original slice of cases but these slices should not have their values changed. Functions used while searching for the best split also accepts pointers to reusable slices and structs to maximize speed by keeping memory allocations to a minimum. BestSplitAllocs contains pointers to these items and its use can be seen in functions like: For categorical predictors, BestSplit will also attempt to intelligently choose between 4 different implementations depending on user input and the number of categories. These include exhaustive, random, and iterative searches for the best combination of categories implemented with bitwise operations against int and big.Int. See BestCatSplit, BestCatSplitIter, BestCatSplitBig and BestCatSplitIterBig. All numerical predictors are handled by BestNumSplit which relies on go's sorting package. Training a Random forest is an inherently parallel process and CloudForest is designed to allow parallel implementations that can tackle large problems while keeping memory usage low by writing and using data structures directly to/from disk. Trees can be grown in separate go routines. The growforest utility provides an example of this that uses go routines and channels to grow trees in parallel and write trees to disk as the are finished by the "worker" go routines. The few summary statistics like mean impurity decrease per feature (importance) can be calculated using thread safe data structures like RunningMean. Trees can also be grown on separate machines. The .sf stochastic forest format allows several small forests to be combined by concatenation and the ForestReader and ForestWriter structs allow these forests to be accessed tree by tree (or even node by node) from disk. For data sets that are too big to fit in memory on a single machine Tree.Grow and FeatureMatrix.BestSplitter can be reimplemented to load candidate features from disk, distributed database etc. By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data: Missing values in the target variable are left out of impurity calculations. This provided generally good results at a fraction of the computational costs of imputing data. Optionally, feature.ImputeMissing or featurematrixImputeMissing can be called before forest growth to impute missing values to the feature mean/mode which Brieman [2] suggests as a fast method for imputing values. This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes. Experimental support is provided for 3 way splitting which splits missing cases onto a third branch. [2] This has so far yielded mixed results in testing. At some point in the future support may be added for local imputing of missing values during tree growth as described in [3] [1] http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1 [2] https://code.google.com/p/rf-ace/ [3] http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.aoas/1223908043&page=record In CloudForest data is stored using the FeatureMatrix struct which contains Features. The Feature struct implements storage and methods for both categorical and numerical data and calculations of impurity etc and the search for the best split. The Target interface abstracts the methods of Feature that are needed for a feature to be predictable. This allows for the implementation of alternative types of regression and classification. Trees are built from Nodes and Splitters and stored within a Forest. Tree has a Grow implements Brieman and Cutler's method (see extract above) for growing a tree. A GrowForest method is also provided that implements the rest of the method including sampling cases but it may be faster to grow the forest to disk as in the growforest utility. Prediction and Voting is done using Tree.Vote and CatBallotBox and NumBallotBox which implement the VoteTallyer interface.

v0.0.0-20150921161629-7f0049714db2 • 9 years ago

github.com/ilyalab/cloudforest

Package CloudForest implements ensembles of decision trees for machine learning in pure Go (golang to search engines). It allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical/categorical data with missing values. These include: Breiman and Cutler's Random Forest for Classification and Regression Adaptive Boosting (AdaBoost) Classification Gradiant Boosting Tree Regression Entropy and Cost driven classification L1 regression Feature selection with artificial contrasts Proximity and model structure analysis Roughly balanced bagging for unbalanced classification The API hasn't stabilized yet and may change rapidly. Tests and benchmarks have been performed only on embargoed data sets and can not yet be released. Library Documentation is in code and can be viewed with godoc or live at: http://godoc.org/github.com/IlyaLab/CloudForest Documentation of command line utilities and file formats can be found in README.md, which can be viewed fromated on github: http://github.com/IlyaLab/CloudForest Pull requests and bug reports are welcome. CloudForest was created by Ryan Bressler and is being developed in the Shumelivich Lab at the Institute for Systems Biology for use on genomic/biomedical data with partial support from The Cancer Genome Atlas and the Inova Translational Medicine Institute. CloudForest is intended to provide fast, comprehensible building blocks that can be used to implement ensembles of decision trees. CloudForest is written in Go to allow a data scientist to develop and scale new models and analysis quickly instead of having to modify complex legacy code. Data structures and file formats are chosen with use in multi threaded and cluster environments in mind. Go's support for function types is used to provide a interface to run code as data is percolated through a tree. This method is flexible enough that it can extend the tree being analyzed. Growing a decision tree using Breiman and Cutler's method can be done in an anonymous function/closure passed to a tree's root node's Recurse method: This allows a researcher to include whatever additional analysis they need (importance scores, proximity etc) in tree growth. The same Recurse method can also be used to analyze existing forests to tabulate scores or extract structure. Utilities like leafcount and errorrate use this method to tabulate data about the tree in collection objects. Decision tree's are grown with the goal of reducing "Impurity" which is usually defined as Gini Impurity for categorical targets or mean squared error for numerical targets. CloudForest grows trees against the Target interface which allows for alternative definitions of impurity. CloudForest includes several alternative targets: Additional targets can be stacked on top of these target to add boosting functionality: Repeatedly splitting the data and searching for the best split at each node of a decision tree are the most computationally intensive parts of decision tree learning and CloudForest includes optimized code to perform these tasks. Go's slices are used extensively in CloudForest to make it simple to interact with optimized code. Many previous implementations of Random Forest have avoided reallocation by reordering data in place and keeping track of start and end indexes. In go, slices pointing at the same underlying arrays make this sort of optimization transparent. For example a function like: can return left and right slices that point to the same underlying array as the original slice of cases but these slices should not have their values changed. Functions used while searching for the best split also accepts pointers to reusable slices and structs to maximize speed by keeping memory allocations to a minimum. BestSplitAllocs contains pointers to these items and its use can be seen in functions like: For categorical predictors, BestSplit will also attempt to intelligently choose between 4 different implementations depending on user input and the number of categories. These include exhaustive, random, and iterative searches for the best combination of categories implemented with bitwise operations against int and big.Int. See BestCatSplit, BestCatSplitIter, BestCatSplitBig and BestCatSplitIterBig. All numerical predictors are handled by BestNumSplit which relies on go's sorting package. Training a Random forest is an inherently parallel process and CloudForest is designed to allow parallel implementations that can tackle large problems while keeping memory usage low by writing and using data structures directly to/from disk. Trees can be grown in separate go routines. The growforest utility provides an example of this that uses go routines and channels to grow trees in parallel and write trees to disk as the are finished by the "worker" go routines. The few summary statistics like mean impurity decrease per feature (importance) can be calculated using thread safe data structures like RunningMean. Trees can also be grown on separate machines. The .sf stochastic forest format allows several small forests to be combined by concatenation and the ForestReader and ForestWriter structs allow these forests to be accessed tree by tree (or even node by node) from disk. For data sets that are too big to fit in memory on a single machine Tree.Grow and FeatureMatrix.BestSplitter can be reimplemented to load candidate features from disk, distributed database etc. By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data: Missing values in the target variable are left out of impurity calculations. This provided generally good results at a fraction of the computational costs of imputing data. Optionally, feature.ImputeMissing or featurematrixImputeMissing can be called before forest growth to impute missing values to the feature mean/mode which Brieman [2] suggests as a fast method for imputing values. This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes. Experimental support is provided for 3 way splitting which splits missing cases onto a third branch. [2] This has so far yielded mixed results in testing. At some point in the future support may be added for local imputing of missing values during tree growth as described in [3] [1] http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1 [2] https://code.google.com/p/rf-ace/ [3] http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.aoas/1223908043&page=record In CloudForest data is stored using the FeatureMatrix struct which contains Features. The Feature struct implements storage and methods for both categorical and numerical data and calculations of impurity etc and the search for the best split. The Target interface abstracts the methods of Feature that are needed for a feature to be predictable. This allows for the implementation of alternative types of regression and classification. Trees are built from Nodes and Splitters and stored within a Forest. Tree has a Grow implements Brieman and Cutler's method (see extract above) for growing a tree. A GrowForest method is also provided that implements the rest of the method including sampling cases but it may be faster to grow the forest to disk as in the growforest utility. Prediction and Voting is done using Tree.Vote and CatBallotBox and NumBallotBox which implement the VoteTallyer interface.

v0.0.0-20150921161629-7f0049714db2 • 9 years ago

github.com/ryanbressler/cloudforest

Package CloudForest implements ensembles of decision trees for machine learning in pure Go (golang to search engines). It allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical/categorical data with missing values. These include: Breiman and Cutler's Random Forest for Classification and Regression Adaptive Boosting (AdaBoost) Classification Gradiant Boosting Tree Regression Entropy and Cost driven classification L1 regression Feature selection with artificial contrasts Proximity and model structure analysis Roughly balanced bagging for unbalanced classification The API hasn't stabilized yet and may change rapidly. Tests and benchmarks have been performed only on embargoed data sets and can not yet be released. Library Documentation is in code and can be viewed with godoc or live at: http://godoc.org/github.com/ryanbressler/CloudForest Documentation of command line utilities and file formats can be found in README.md, which can be viewed fromated on github: http://github.com/ryanbressler/CloudForest Pull requests and bug reports are welcome. CloudForest was created by Ryan Bressler and is being developed in the Shumelivich Lab at the Institute for Systems Biology for use on genomic/biomedical data with partial support from The Cancer Genome Atlas and the Inova Translational Medicine Institute. CloudForest is intended to provide fast, comprehensible building blocks that can be used to implement ensembles of decision trees. CloudForest is written in Go to allow a data scientist to develop and scale new models and analysis quickly instead of having to modify complex legacy code. Data structures and file formats are chosen with use in multi threaded and cluster environments in mind. Go's support for function types is used to provide a interface to run code as data is percolated through a tree. This method is flexible enough that it can extend the tree being analyzed. Growing a decision tree using Breiman and Cutler's method can be done in an anonymous function/closure passed to a tree's root node's Recurse method: This allows a researcher to include whatever additional analysis they need (importance scores, proximity etc) in tree growth. The same Recurse method can also be used to analyze existing forests to tabulate scores or extract structure. Utilities like leafcount and errorrate use this method to tabulate data about the tree in collection objects. Decision tree's are grown with the goal of reducing "Impurity" which is usually defined as Gini Impurity for categorical targets or mean squared error for numerical targets. CloudForest grows trees against the Target interface which allows for alternative definitions of impurity. CloudForest includes several alternative targets: Additional targets can be stacked on top of these target to add boosting functionality: Repeatedly splitting the data and searching for the best split at each node of a decision tree are the most computationally intensive parts of decision tree learning and CloudForest includes optimized code to perform these tasks. Go's slices are used extensively in CloudForest to make it simple to interact with optimized code. Many previous implementations of Random Forest have avoided reallocation by reordering data in place and keeping track of start and end indexes. In go, slices pointing at the same underlying arrays make this sort of optimization transparent. For example a function like: can return left and right slices that point to the same underlying array as the original slice of cases but these slices should not have their values changed. Functions used while searching for the best split also accepts pointers to reusable slices and structs to maximize speed by keeping memory allocations to a minimum. BestSplitAllocs contains pointers to these items and its use can be seen in functions like: For categorical predictors, BestSplit will also attempt to intelligently choose between 4 different implementations depending on user input and the number of categories. These include exhaustive, random, and iterative searches for the best combination of categories implemented with bitwise operations against int and big.Int. See BestCatSplit, BestCatSplitIter, BestCatSplitBig and BestCatSplitIterBig. All numerical predictors are handled by BestNumSplit which relies on go's sorting package. Training a Random forest is an inherently parallel process and CloudForest is designed to allow parallel implementations that can tackle large problems while keeping memory usage low by writing and using data structures directly to/from disk. Trees can be grown in separate go routines. The growforest utility provides an example of this that uses go routines and channels to grow trees in parallel and write trees to disk as the are finished by the "worker" go routines. The few summary statistics like mean impurity decrease per feature (importance) can be calculated using thread safe data structures like RunningMean. Trees can also be grown on separate machines. The .sf stochastic forest format allows several small forests to be combined by concatenation and the ForestReader and ForestWriter structs allow these forests to be accessed tree by tree (or even node by node) from disk. For data sets that are too big to fit in memory on a single machine Tree.Grow and FeatureMatrix.BestSplitter can be reimplemented to load candidate features from disk, distributed database etc. By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data: Missing values in the target variable are left out of impurity calculations. This provided generally good results at a fraction of the computational costs of imputing data. Optionally, feature.ImputeMissing or featurematrixImputeMissing can be called before forest growth to impute missing values to the feature mean/mode which Brieman [2] suggests as a fast method for imputing values. This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes. Experimental support is provided for 3 way splitting which splits missing cases onto a third branch. [2] This has so far yielded mixed results in testing. At some point in the future support may be added for local imputing of missing values during tree growth as described in [3] [1] http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1 [2] https://code.google.com/p/rf-ace/ [3] http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.aoas/1223908043&page=record In CloudForest data is stored using the FeatureMatrix struct which contains Features. The Feature struct implements storage and methods for both categorical and numerical data and calculations of impurity etc and the search for the best split. The Target interface abstracts the methods of Feature that are needed for a feature to be predictable. This allows for the implementation of alternative types of regression and classification. Trees are built from Nodes and Splitters and stored within a Forest. Tree has a Grow implements Brieman and Cutler's method (see extract above) for growing a tree. A GrowForest method is also provided that implements the rest of the method including sampling cases but it may be faster to grow the forest to disk as in the growforest utility. Prediction and Voting is done using Tree.Vote and CatBallotBox and NumBallotBox which implement the VoteTallyer interface.

v0.0.0-20220205065429-8f151e494fd2 • 3 years ago

github.com/ugurcancaykara/data-structures-and-algorithms/data-structures/tree/MerkleTree

github.com/callidon/joseki

github.com/hoducha/hercules

gitlab.com/tozd/go/errors

github.com/AlexWan0/go-watrix

github.com/cespare/go-trie

github.com/poxiaoyun/govmomi

github.com/zhengkes/govmomi

github.com/suraj-testing2/soda_trees

github.com/spakin/disjoint

github.com/prataprc/gostore

github.com/ancientlore/go-avltree/v2

github.com/webability-go/xcore/v2

github.com/chenjianmei111/go-amt-ipld/v3

github.com/chenjianmei111/go-hamt-ipld/v3

github.com/openacid/slimarray

github.com/xrfang/fxml

github.com/gholt/valuelocmap

github.com/ksin751119/merkletree

github.com/qri-io/deepdiff

github.com/deepin-community/golang-github-derekparker-trie

github.com/zhangbo2008/merkletree

github.com/sgtest/appdash

github.com/i5heu/ouroboros-db

github.com/webability-go/xcore

github.com/robloxapi/rbxbin

gitee.com/pengdacn/suture/v4

github.com/DuncanScu/golux

github.hscsec.cn/vmware/govmomi

github.com/gobwas/hashring

github.com/felpssc/binary-tree-data-structure

gitea.com/lunny/gop

goki.dev/ki

github.com/fufunoyu/govmomi_v0.20.0

github.com/alessandro-c/merkle

github.com/aws/go-kafka-event-source/streams

github.com/IlyaLab/CloudForest

github.com/ilyalab/cloudforest

htdvisser.dev/exp/tree

github.com/flant/govmomi

go.xrfang.cn/fxml

github.com/iwdgo/htmlutils

github.com/twmb/go-mmheap

github.com/massn/ki

github.com/AskAlexSharov/bytebtree

github.com/ryanbressler/cloudforest

github.com/neelance/tree

github.com/cpcallen/flatpack

gopkg.in/thejerf/suture.v4

devt.de/krotik/rufs