Package joseki is a pure Go library for working with RDF, a powerful framework for representing informations as graphs. For more informations about RDF itself, please see https://www.w3.org/TR/rdf11-concepts Joseki provides the following features to work with RDF : * Structures to represent and manipulate the RDF model (URIs, Literals, Blank Nodes, Triples, etc). * RDF Graphs to store data, with several implentations provided. * A Low level API to query data stored in graphs. * A High level API to query data using the SPARQL 1.1 query language. * Query processing using modern techniques such as join ordering or optimized query execution plans. * Load RDF data stored in files in various formats (N-Triples, Turtle, etc) into any graph. * Serialize a RDF Graph into various formats. This package aims to work with RDF graphs, which are composed of RDF Triple {Subject Object Predicate}. Using joseki, you can represent an RDF Triple as followed : You can also store your RDF Triples in a RDF Graph, using various type of graphs. Here, we use a Tree Graph to store our triple : You can also query any triple from a RDF Graph, using a low level API or a SPARQL query. For more informations about specific features, see the documentation of each subpackage. Author : Thomas Minier
Package hercules contains the functions which are needed to gather various statistics from a Git repository. The analysis is expressed in a form of the tree: there are nodes - "pipeline items" - which require some other nodes to be executed prior to selves and in turn provide the data for dependent nodes. There are several service items which do not produce any useful statistics but rather provide the requirements for other items. The top-level items include: - BurndownAnalysis - line burndown statistics for project, files and developers. - CouplesAnalysis - coupling statistics for files and developers. - ShotnessAnalysis - structural hotness and couples, by any Babelfish UAST XPath (functions by default). The typical API usage is to initialize the Pipeline class: Then add the required analysis: This call will add all the needed intermediate pipeline items. Then link and execute the analysis tree: Finally extract the result: The actual usage example is cmd/hercules/root.go - the command line tool's code. You can provide additional options via `facts` on initialization. For example, to provide your own logger, enable people-tracking, and set a custom tick size: Hercules depends heavily on https://github.com/src-d/go-git and leverages the diff algorithm through https://github.com/sergi/go-diff. Besides, BurndownAnalysis involves File and RBTree. These are low level data structures which enable incremental blaming. File carries an instance of RBTree and the current line burndown state. RBTree implements the red-black balanced binary tree and is based on https://github.com/yasushi-saito/rbtree. Coupling stats are supposed to be further processed rather than observed directly. labours.py uses Swivel embeddings and visualises them in Tensorflow Projector. Shotness analysis as well as other UAST-featured items relies on [Babelfish](https://doc.bblf.sh) and requires the server to be running.
Package errors provides errors with a recorded stack trace and optional structured details. The traditional error handling idiom in Go is roughly akin to which when applied recursively up the call stack results in error reports without a stack trace or context. The errors package provides error handling primitives to annotate errors along the failure path in a way that does not destroy the original error. When interacting with code which returns errors without a stack trace, you can upgrade that error to one with a stack trace using errors.WithStack. For example: errors.WithStack records the stack trace at the point where it was called, so use it as close to where the error originated as you can get so that the recorded stack trace is more precise. The example above uses errors.E for the returned error type instead of the standard error type. This is not required, but it tells Go that you expect that the function returns only errors with a stack trace and Go type system then helps you find any cases where this is not so. errors.WithStack does not record the stack trace if it is already present in the error so it is safe to call it if you are unsure if the error contains a stack trace. Errors with a stack trace implement the following interface, returning program counters of function invocations: You can use standard runtime.CallersFrames to obtain stack trace frame information (e.g., function name, source code file and line). You can also use errors.StackFormatter to format the stack trace. Although the stackTracer interface is not exported by this package, it is considered a part of its stable public interface. Sometimes an error occurs in a low-level function and the error messages returned from it are too low-level, too. You can use errors.Wrap to construct a new higher-level error while recording the original error as a cause. In the example above we returned a new error with a new message, hiding the low-level details. The returned error implements the following interface: which enables access to the underlying low-level error. You can also use errors.Cause to obtain the cause. Although the causer interface is not exported by this package, it is considered a part of its stable public interface. Sometimes you do not want to hide the error message but just add to it. You can use errors.WithMessage, which adds a prefix to the existing message, or errors.Errorf, which gives you more control over the new message. Example new messages could then be, respectively: Errors returned by this package implement the detailer interface: which enables access to a map with optional additional details about the error. Returned map can be modified in-place. You can also use errors.Details and errors.AllDetails to access details: You can also use errors.WithDetails as an alternative to errors.WithStack if you also want to add details while recording the stack trace: Errors which implement the following standard unwrapper interfaces: form a tree of errors where a wrapping error points its parent, wrapped, error(s). Errors returned from this package implement this interface to return the original error or errors, when they exist. This enables us to have constant base errors which we annotate with a stack trace before we return them: Or with details: We can use errors.Is to determine which error has been returned: Works across the tree, too: To access details, use: You can join multiple errors into one error by calling errors.Join. Join also records the stack trace at the point it was called. All errors with a stack trace returned from this package implement fmt.Formatter interface and can be formatted by the fmt package. They also support marshaling to JSON. Same formatting and JSON marshaling for errors coming outside of this package can be done by wrapping them into errors.Formatter.
The trie package implements a trie (prefix tree) data structure over byte slices.
This package is the root package of the govmomi library. The library is structured as follows: The minimal usable functionality is available through the vim25 package. It contains subpackages that contain generated types, managed objects, and all available methods. The vim25 package is entirely independent of the other packages in the govmomi tree -- it has no dependencies on its peers. The vim25 package itself contains a client structure that is passed around throughout the entire library. It abstracts a session and its immutable state. See the vim25 package for more information. The session package contains an abstraction for the session manager that allows a user to login and logout. It also provides access to the current session (i.e. to determine if the user is in fact logged in) The object package contains wrappers for a selection of managed objects. The constructors of these objects all take a *vim25.Client, which they pass along to derived objects, if applicable. The govc package contains the govc CLI. The code in this tree is not intended to be used as a library. Any functionality that govc contains that _could_ be used as a library function but isn't, _should_ live in a root level package. Other packages, such as "event", "guest", or "license", provide wrappers for the respective subsystems. They are typically not needed in normal workflows so are kept outside the object package.
This package is the root package of the govmomi library. The library is structured as follows: The minimal usable functionality is available through the vim25 package. It contains subpackages that contain generated types, managed objects, and all available methods. The vim25 package is entirely independent of the other packages in the govmomi tree -- it has no dependencies on its peers. The vim25 package itself contains a client structure that is passed around throughout the entire library. It abstracts a session and its immutable state. See the vim25 package for more information. The session package contains an abstraction for the session manager that allows a user to login and logout. It also provides access to the current session (i.e. to determine if the user is in fact logged in) The object package contains wrappers for a selection of managed objects. The constructors of these objects all take a *vim25.Client, which they pass along to derived objects, if applicable. The govc package contains the govc CLI. The code in this tree is not intended to be used as a library. Any functionality that govc contains that _could_ be used as a library function but isn't, _should_ live in a root level package. Other packages, such as "event", "guest", or "license", provide wrappers for the respective subsystems. They are typically not needed in normal workflows so are kept outside the object package.
Package disjoint implements a disjoint-set data structure. A disjoint-set—also called union-find—data structure keeps track of nonoverlapping partitions of a collection of data elements. Initially, each data element belongs to its own, singleton, set. The following operations can then be performed on these sets: • Union merges two sets into a single set containing the union of their elements. • Find returns an arbitrary element from a set. The critical feature is that Find returns the same element when given any element in a set. The implication is that two elements A and B belong to the same set if and only if A.Find() == B.Find(). Both Union and Find take as arguments elements of sets, not the sets themselves. Because sets are mutually disjoint, an element uniquely identifies a set. Ergo, there is no need to pass sets to those functions. Disjoint sets are more limited in functionality than conventional sets. They support only set union, not set intersection, set difference, or any other set operation. They don't allow an element to reside in more than one set. They don't even provide a way to enumerate the elements in a given set. What makes them useful, though, is that they're extremely fast, especially for large sets; both Union and Find run in amortized near-constant time. See http://en.wikipedia.org/wiki/Disjoint-set_data_structure for more information. Disjoint sets are often used in graph algorithms, for example to find a minimal spanning tree for a graph or to determine if adding a given edge to a graph would create a cycle. Draw a maze. This is my favorite use of disjoint-set forests. The algorithm works by repeatedly knocking down walls between two sets of rooms that are not part of the same union and merging those rooms into a single union. A union represents connected components—rooms in the maze that are mutually reachable. A single union implies that every room is reachable from every other room. This is a fairly long example, but note that only half of it relates to generating the maze. The other half renders the maze using Unicode box-drawing characters.
Package avltree implements a height-balanced binary tree with array-like indexing capability. An AVL tree (Adel'son-Vel'skii & Landis) is a binary search tree in which the heights of the left and right subtrees of the root differ by at most one and in which the left and right subtrees are again AVL trees. With each node of an AVL tree is associated a balance factor that is Left High, Equal, or Right High according, respectively, as the left subtree has height greater than, equal to, or less than that of the right subtree. The AVL tree is, in practice, balanced quite well. It can (at the worst case) become skewed to the left or right, but never so much that it becomes inefficient. The balancing is done as items are added or deleted. This version is enhanced to allow "indexing" of values in the tree; however, the indexes are not stable as the tree could be resorted as items are added or removed. It is safe to iterate or search a tree from multiple threads provided that no threads are modifying the tree. See also: Robert L. Kruse, Data Structures and Program Design, 2nd Ed., Prentice-Hall
Package xcore is a set of basic objects for programation (XCache for caches, XDataset for data sets, XLanguage for languages and XTemplate for templates). For GO, the actual existing code includes: - XCache: Application Memory Caches for any purpose, with time control and quantity control of object in the cache and also check changes against original source. It is a thread safe cache. - XDataset: Basic nested data structures for any purpose (template injection, configuration files, database records, etc). - XLanguage: language dependent text tables for internationalization of code. The sources can be text or XML file definitions. - XTemplate: template system with meta language to create complex documents (compatible with any text language, HTML, CSS, JS, PDF, XML, etc), heavily used on CMS systems and others. It is already used on sites that serve more than 60 million pages a month (500 pages per second on pike hour) and can be used on multithreading environment safely. XCache is a library to cache all the data you want into current application memory for a very fast access to the data. The access to the data support multithreading and concurrency. For the same reason, this type of cache is not persistent (if you exit the application) and cannot grow too much (as memory is the limit). However, you can control a timeout of each cache piece, and eventually a comparison function against a source (file, database, etc) to invalid the cache. 1. Declare a new XCache with NewXCache() function: 2. Fill in the cache: Once you have declared the cache, you can fill it with anything you want. The main cache object is an interface{} so you can put here anything you need, from simple variables to complex structures. You need to use the Set function: Note the ID is always a string, so convert a database key to string if needed. 3. To use the cache, just ask for your entry with Get function: 4. To maintain the cache: You may need Del function, to delete a specific entry (maybe because you deleted the record in database). You may also need Clean function to deletes a percentage of the cache, or Flush to deletes it all. The Verify function is used to check cache entries against their sources through the Validator function. Be very careful, if the cache is big or the Validator function is complex (maybe ask for a remote server information), the verification may be VERY slow and huge CPU use. The Count function gives some stats about the cache. 5. How to use Verify Function: This function is recommended when the source is local and fast to check (for instance a language file or a template file). When the source is distant (other cluster database, any rpc source on another network, integration of many parts, etc), it is more recommended to create a function that will delete the cache when needed (on demand cache change). The validator function is a func(id, time.Time) bool function. The first parameter is the ID entry in the cache, the second parameter the time of the entry was created. The validator function returns true is the cache is still valid, or false if it needs to be invalidated. The XCache is thread safe. The cache can be limited in quantity of entries and timeout for data. The cache is automanaged (for invalid expired data) and can be cleaned partially or totally manually. The XLanguage table of text entries can be loaded from XML file, XML string or normal text file or string. It is used to keep a table of id=value set of entries in any languages you need, so it is easy to switch between XLanguage instance based on the required language needed. Obviously, any XLanguage you load in any language should have the same id entries translated, for the same use. The XLanguage object is thread safe 1. loading: You can load any file or XML string directly into the object. 1.1 The XML Format is: NAMEOFTABLE is the name of your table entry, for example "loginform", "user_report", etc. LG is the ISO-3369 2 letters language ID, for example "es" for spanish, "en" for english, "fr" for french, etc. ENTRYNAME is the ID of the entry, for example "greating", "yourname", "submitbutton". ENTRYVALUE is the text for your entry, for example "Hello", "You are:", "Save" if your table is in english. STATUSVALUE is the status of the entry- You may put any value to control your translation over time and processes. 1.2 The flat text format is: ENTRYNAME is the ID of the entry, for example "greating", "yourname", "submitbutton". ENTRYVALUE is the text for your entry, for example "Hello", "You are:", "Save" if your table is in english. There is no name of table or language in this format (you "know" what you are loading). The advantage to use XML format is to have more control over your language, and eventyally add attributes into your entries, for instance you may add attributes translated="yes/no", verified="yes/no", and any other data that your system could insert. The XLanguage will ignore those attributes loading the table. 2. creation: To create a new XLanguage empty structure: There are 4 functions to create the language from a file or string, flat text or XML text: Then you can use the set of basic access functions: SetName/SetLanguage functions are used to set the table name and language of the object (generally to build an object from scratch). GetName/GetLanguage functions are used to get the table name and language of the object (generally when you load it from some source). Set/Get/Del functions are used to add or modify a new entry, read an entry, or deletes an entry in the object. SetStatus/GetStatus functions are used to add or get a status for the entry in the object. To create am XML file from the objet, you can use the GetXML() function 1. Overview: The XDataSet is a set of interfaces and basic classes ready-to-use to build a standard set of data optionally nested and hierarchical, that can be used for any purpose: - Keep complex data in memory. - Create JSON structures. - Inject data into templates. - Interchange database data (records set and record). You can store into it generic supported data, as well as any complex interface structures: - Int - Float - String - Time - Bool - []Int - []Float - []Time - []Bool - XDataSetDef (anything extended with this interface) - []String - Anything else ( interface{} ) - XDataSetCollectionDef (anything extended with this interface) The generic supported data comes with a set of functions to get/set those data directly into the XDataset. Example: Note that all references to XDataset and XDatasetCollection are pointers, always (to be able to modify the values of them). 2. XDatasetDef interface: It is the interface to describe a simple set of data mapped as "name": value, where value can be of any type. The interface implements a good amount of basic methods to get the value on various format such as GetString("name"), GetInt("name"), etc (see below). If the value is another type as asked, the method should contert it if possible. For instance "key":123 required through GetString("key") should return "123". The XDataset type is a simple map[string]interface{} with all the implemented methods and should be enough to use for almost all required cases. However, you can build any complex structure that extends the interface and implements all the required functions to stay compatible with the XDatasetDef. 3. XDatasetCollectionDef Interface: This is the interface used to extend any type of data as a Collection, i-e an array of XDatasetDef. This is a slice of any XDatasetDef compatible data. The interface implements some methods to work on array structure such as Push, Pop, Shift, Unshift and some methods to search data into the array. The XDatasetCollection type is a simple []DatasetDef with all the implemented methods and should be enough to use for almost all required cases. 1. Overview: The XDataSetTS is a DatasetDef structure, thread safe. It is build on the XDataset with the same properties, but is thread safe to protect Read/Write accesses from different thread. Example: You may also build a XDatasetTS to encapsulate a XDatasetDef that is not thread safe, to use it safely Note that all references to XDatasetTS are pointers, always (to be able to modify the values of them). The DatasetTS meet the XDatasetDef interface 1. Overview: This is a class to compile and keep a Template that can be injected with an XDataSet structure of data, with a metalanguage to inject the data. The metalanguage is extremely simple and is made to be useful and **really** separate programation from template code (not like other many generic template systems that just mix code and data). A template is a set of HTML/XML (or any other language) string with a meta language to inject variables and build a final string. The XCore XTemplate system is based on the injection of parameters, language translation strings and data fields directly into the HTML (Or any other language you need) template. The HTML itself (or any other language) is a text code not directly used by the template system, but used to dress the data you want to represent in your preferred language. The variables to inject must be into a XDataSet structure or into a structure extended from XDataSetDef interface. The injection of data is based on a XDataSet structure of values that can be nested into another XDataSet and XDataSetConnection and so on. The template compiler recognize nested arrays to automatically make loops on the information. Templates are made to store reusable HTML code, and overall easily changeable by people that do not know how to write programs. A template can be as simple as a single character (no variables to inject) to a very complex nested, conditional and loops sub-templates. Yes. this is a template, but a very simple one without need to inject any data. Let's go more complex: Having an array of data, we want to paint it beautifull: We can create a template to inject this data into it: 2. Create and use XTemplateData: In sight to create and use templates, you have all those possible options to use: Creates the XTemplate from a string or a file or any other source: Clone the XTemplate: 3. Metalanguage Reference: 3.1 Comments: %-- and --% You may use comments into your template. The comments will be discarded immediately at the compilation of the template and do not interfere with the rest of your code. Example: 3.2 Nested Templates: [[...]] and [[]] You can define new nested templates into your main template A nested template is defined by: The templteid is any combination of lowers letters only (a-z), numbers (0-9), and 3 special chars: . (point) - (dash) and _ (underline). The template is closed with [[]]. There is no limits into nesting templates. Any nested template will inheritate all the father elements and can use father elements too. To call a sub-template, you need to use &&templateid&& syntax (described below in this document). Example: You may use more than one id into the same template to avoid repetition of the same code. The different id's are separated with a pipe | Important note: A template will be visible only on the same level of its declaration. For example, if you put a subtemplate "b" into a subtemplate "a", it will not be visible by &&b&& from the top level, but only into the subtemplate "a". 3.3 Simple Elements: ##...## and {{...}} There are 2 types of simple elements. Language elements and Data injector elements (also called field elements). We "logically" define the 2 type of elements. The separation is only for human logic and template filling, however the language information can perfectly fit into the data to inject (and not use ## entries). 3.3.1 Languages elements: ##entry## All the languages elements should have the format: ##entry##. A language entry is generally anything written into your code or page that does not come from a database, and should adapt to the language of the client visiting your site. Using the languages elements may depend on the internationalization of your page. If your page is going to be in a single language forever, you really dont need to use languages entries. The language elements generally carry titles, menu options, tables headers etc. The language entries are set into the "#" entry of the main template XDataset to inject, and is a XLanguage table. Example: With data to inject: 3.3.2 Field elements: {{fieldname}} Fields values should have the format: {{fieldname}}. Your fields source can be a database or any other preferred repository data source. Example: You can access an element with its path into the data set to inject separating each field level with a > (greater than). This will take the name of the second hobby in the dataset defined upper. (collections are 0 indexed). The 1 denotes the second record of the hobbies XDatasetCollection. If the field is not found, it will be replaced with an empty string. Tecnically your field names can be any string in the dataset. However do not use { } or > into the names of your fields or the XTemplate may not use them correctly. We recommend to use lowercase names with numbers and ._- Accents and UTF8 symbols are also welcome. 3.3.3 Scope: When you use an id to point a value, the template will first search into the available ids of the local level. If no id is found, the it will search into the upper levers if any, and so on. Example: At the level of 'data2', using {{appname}} will get back 'DomCore'. At the level of 'key1', using {{appname}} will get back 'Nested App'. At the level of 'key2', using {{appname}} will get back 'DomCore'. At the level of root, 'data1' or 'detail', using {{appname}} will get back an empty string. 3.3.4 Path access: id>id>id>id At any level into the data array, you can access any entry into the subset array. For instance, taking the previous array of data to inject, let's suppose we are into a nested meta elements at the 'data1' level. You may want to access directly the 'Juan' entry. The path will be: The José's status value from the root will be: 3.4 Meta Elements They consist into an injection of a XDataset, called the "data to inject", into the template. The meta language is directly applied on the structure of the data array. The data to inject is a nested set of variables and values with the structure you want (there is no specific construction rules). You can inject nearly anything into a template meta elements. Example of a data array to inject: You can access directly any data into the array with its relative path (relative to the level you are when the metaelements are applied, see below). There are 4 structured meta elements in the XTemplate templates to use the data to inject: Reference, Loops, Condition and Debug. The structure of the meta elements in the template must follow the structure of the data to inject. 3.4.1 References to another template: &&order&& 3.4.1.1 When order is a single id (characters a-z0-9.-_), it will make a call to a sub template with the same set of data and replace the &&...&& with the result. The level in the data set is not changed. Example based on previous array of Fred's data: 3.4.1.2 When order contains 2 parameters separated by a semicolumn :, then second parameter is used to change the level of the data of array, with the subset with this id. The level in the data set is changed to this sub set. Example based on previous array of Fred's data: 3.4.1.3 When order contains 3 parameters separated by a semicolumn :, the second and third parameters are used to search the name of the new template based on the data fields to inject. This is an indirect access to the template. The name of the subtemplate is build with parameter3 as prefix and the content of parameter2 value. The third parameter must be empty. 3.4.2 Loops: @@order@@ 3.4.2.1 Overview This meta element will loop over each itterance of the set of data and concatenate each created template in the same order. You need to declare a sub template for this element. You may aso declare derivated sub templates for the different possible cases of the loop: For instance, If your main subtemplate for your look is called "hobby", you may need a different template for the first element, last element, Nth element, Element with a value "no" in the sport field, etc. The supported postfixes are: When the array to iterate is empty: - .none (for example "There is no hobby") When the array contains elements, it will search in order, the following template and use the first found: - templateid.key.[value] value is the key of the vector line. If the collection has a named key (string) or is a direct array (0, 1, 2...) - templateid.first if it is the first element of the array set (new from v1.01.11) - templateid.last if it is the first element of the array set (new from v1.01.11) - templateid.even if the line number is even - templateid in all other cases (odd is contained here if even is defined) Since v2.1.7, you can also use the pseudo field {{.counter}} into the loop subtemplate, to get the number of the counter of the loop, it is 1-based (first loop is 1, not 0) 3.4.2.2 When order is a single id (characters a-z0-9.-_), it will make a call to the sub template id with the same subset of data with the same id and replace the @@...@@ for each itterance of the data with the result. Example based on previous array of Fred's data: 3.4.2.3 When order contains 2 parameters separated by a semicolumn :, then first parameter is used to change the level of the data of array, with the subset with this id, and the second one for the template to use. Example based on previous array of Fred's data: 3.4.3 Conditional: ??order?? Makes a call to a subtemplate only if the field exists and have a value. This is very userfull to call a sub template for instance when an image or a video is set. When the condition is not met, it will search for the [id].none template. The conditional element does not change the level in the data set. 3.4.3.1 When order is a single id (characters a-z0-9.-_), it will make a call to the sub template id with the same field in the data and replace the ??...?? with the corresponding template Example based on previous array of Fred's data: 3.4.3.2 When order contains 2 parameters separated by a semicolumn :, then second parameter is used to change the level of the data of array, with the subset with this id. Example based on previous array of Fred's data: If the asked field is a catalog, true/false, numbered, you may also use .[value] subtemplates 3.5 Debug Tools: !!order!! There are two keywords to dump the content of the data set. This is very useful when you dont know the code that calls the template, don't remember some values, or for debug facilities. 3.5.1 !!dump!! Will show the totality of the data set, with ids and values. 3.5.1 !!list!! Will show only the tree of parameters, values are not shown.
Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (3^2)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.
Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin. HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID). The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive. Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order. This HAMT implementation: • Fixes the `bucketSize` to 3. • Defaults the `bitWidth` to 8, however within Filecoin it uses 5 • Defaults the hash algorithm to the 64-bit variant of Murmur3-x64 The algorithm used here is identical to that of the IPLD HashMap algorithm specified at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The specific parameters used by Filecoin and the DAG-CBOR block layout differ from the specification and are defined at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.
Package slimarray uses polynomial to compress and store an array of uint32. A uint32 costs only 5 bits in a sorted array of a million number in range [0, 1000*1000]. We use a polynomial y = a + bx + cx² to describe the overall trend of the numbers. And for every number i we add a residual to fit the gap between y(i) and nums[i]. E.g. If there are 4 numbers: 0, 15, 33, 50 The polynomial and residuals are: In this case the residuals require 3 bits for each of them. To retrieve the numbers, we evaluate y(i) and add the residual to it: Another space efficient data structure to store uint32 array is trie or prefix tree or radix tree. It is possible to use bitmap-based btree like structure to reduce space(very likely in such case it provides higher compression rate). But it requires the array to be sorted. SlimArray does not have such restriction. It is more adaptive with data layout. To achieve high compression rate, it only requires the data has a overall trend, e.g., roughly sorted, as seen in the above 4 integers examples. Additionally, it also accept duplicated element in the array, which a bitmap based or tree-like data structure does not allow. SlimArray splits the entire array into segments(Seg), each of which has 1024 numbers. And then it splits every segment into several spans. Every span has its own polynomial. A span has 16*k numbers. A segment has at most 64 spans. A SlimArray is a compacted data structure. The original data structures are defined as follow(assumes original user data is `nums []uint32`): A span stores 16*k int32 in it, where k ∈ [1, 64). `Seg.SpansBitmap` describes the layout of Span-s in a Seg. The i-th "1" indicates where the last 16 numbers are in the i-th Span. e.g.: In the above example: `Seg.Rank` caches the total count of "1" in all preceding Seg.SpansBitmap. This accelerate locating a Span in the packed field SlimArray.Polynomials . `Span.width` is the count of numbers stored in this span. It does not need to be stored because it can be calculated by counting the "0" between two "1" in `Seg.SpansBitmap`. `Span.Polynomial` stores 3 coefficients of the polynomial describing the overall trend of this span. I.e. the `[a, b, c]` in `y = a + bx + cx²` `Span.Config.Offset` adjust the offset to locate a residual. In a span we want to have that: But if the preceding span has smaller residual width, the "offset" could be negative, e.g.: span[0] has residual of width 0 and 16 residuals, span[1] has residual of width 4. Then the "offset" of span[1] is `-16*4` in order to satisfy: `(-16*4) + i * 4` is the correct residual position, for i in [16, 32). `Span.Config.ResidualWidth` specifies the number of bits to store every residual in this span, it must be a power of 2: `2^k`. `Span.Residuals` is an array of residuals of length `Span.width`. Every elt in it is a `ResidualWidth`-bits integers. SlimArray compact `Seg` into a dense format: `SlimArray.Residuals` simply packs the residuals of every nums[i] together.
fxml - FreeStyle XML Parser This package provides a simple parser which reads a XML document and output a tree structure, which does not need a pre-defined “struct”, hence the name “FreeStyle”.
Package locmap provides a concurrency-safe data structure that maps keys to value locations. A key is 128 bits and is specified using two uint64s (keyA, keyB). A value location is specified using a blockID, offset, and length triplet. Each mapping is assigned a timestamp and the greatest timestamp wins. The timestamp usually has some number of the lowest bits in use for state information such as active and inactive entries. For example, the lowest bit might be used as 0 = active, 1 = deletion marker so that deletion events are retained for some time period before being completely removed with Discard. Exactly how many bits are used and what they're used for is outside the scope of the mapping itself. This implementation essentially uses a tree structure of slices of key to location assignments. When a slice fills up, an additional slice is created and half the data is moved to the new slice and the tree structure grows. If a slice empties, it is merged with its pair in the tree structure and the tree shrinks. The tree is balanced by high bits of the key, and locations are distributed in the slices by the low bits. There is also a modified form of the data structure called GroupLocMap that expands the primary key of the map to two 128 bit keys and offers a GetGroup method which retrieves all matching items for the first key.
Package merkletree implements a Merkle Tree capable of storing arbitrary content. A Merkle Tree is a hash tree that provides an efficient way to verify the contents of a set data are present and untampered with. At its core, a Merkle Tree is a list of items representing the data that should be verified. Each of these items is inserted into a leaf node and a tree of hashes is constructed bottom up using a hash of the nodes left and right children's hashes. This means that the root node will effictively be a hash of all other nodes (hashes) in the tree. This property allows the tree to be reproduced and thus verified by on the hash of the root node of the tree. The benefit of the tree structure is verifying any single content entry in the tree will require only nlog2(n) steps in the worst case. Creating a new merkletree requires that the type that the tree will be constructed from implements the Content interface. A slice of the Content items should be created and then passed to the NewTree method. t represents the Merkle Tree and can be verified and manipulated with the API methods described below.
Package deepdiff is a structured data differ that aims for near-linear time complexity. It's intended to calculate differences & apply patches to structured data ranging from 0-500MBish of encoded JSON Diffing structured data carries additional complexity when compared to the standard unix diff utility, which operates on lines of text. By using the structure of data itself, deepdiff is able to provide a rich description of changes that maps onto the structure of the data itself. deepdiff ignores semantically irrelevant changes like whitespace, and can isolate changes like column changes to tabular data to only the relevant switches Most algorithms in this space have quadratic time complexity, which from our testing makes them very slow on 3MB JSON documents and unable to complete on 5MB or more. deepdiff currently hovers around the 0.9Sec/MB range on 4 core processors Instead of operating on JSON directly, deepdiff operates on document trees consisting of the go types created by unmarshaling from JSON, which aretwo complex types: and five scalar types: by operating on native go types deepdiff can compare documents encoded in different formats, for example decoded CSV or CBOR. deepdiff is based off an algorithm designed for diffing XML documents outlined in: Detecting Changes in XML Documents by Grégory Cobéna & Amélie Marian https://ieeexplore.ieee.org/document/994696 it's been adapted to fit purposes of diffing for Qri: https://github.com/qri-io/qri the guiding use case for this work deepdiff also includes a tool for applying patches, see documentation for details
!! Currently the database is in a very early stage of development and should not be used in production environments. !! A embedded database built around the concept of event trees, emphasizing data deduplication and data integrity checks. By structuring data into event trees, OuroborosDB ensures efficient and intuitive data management. Key features include: There are in the moment two main components in the database
**Please use xcore/v2** **The version 1 is obsolete.** Package xcore is a set of basic objects for programation (XCache for caches, XDataset for data sets, XLanguage for languages and XTemplate for templates). For GO, the actual existing code includes: - XCache: Application Memory Caches for any purpose, with time control and quantity control of object in the cache and also check changes against original source. It is a thread safe cache. - XDataset: Basic nested data structures for any purpose (template injection, configuration files, database records, etc). - XLanguage: language dependent text tables for internationalization of code. The sources can be text or XML file definitions. - XTemplate: template system with meta language to create complex documents (compatible with any text language, HTML, CSS, JS, PDF, XML, etc), heavily used on CMS systems and others. It is already used on sites that serve more than 60 million pages a month (500 pages per second on pike hour) and can be used on multithreading environment safely. XCache is a library to cache all the data you want into current application memory for a very fast access to the data. The access to the data support multithreading and concurrency. For the same reason, this type of cache is not persistent (if you exit the application) and cannot grow too much (as memory is the limit). However, you can control a timeout of each cache piece, and eventually a comparison function against a source (file, database, etc) to invalid the cache. 1. Declare a new XCache with NewXCache() function: 2. Fill in the cache: Once you have declared the cache, you can fill it with anything you want. The main cache object is an interface{} so you can put here anything you need, from simple variables to complex structures. You need to use the Set function: Note the ID is always a string, so convert a database key to string if needed. 3. To use the cache, just ask for your entry with Get function: 4. To maintain the cache: You may need Del function, to delete a specific entry (maybe because you deleted the record in database). You may also need Clean function to deletes a percentage of the cache, or Flush to deletes it all. The Verify function is used to check cache entries against their sources through the Validator function. Be very careful, if the cache is big or the Validator function is complex (maybe ask for a remote server information), the verification may be VERY slow and huge CPU use. The Count function gives some stats about the cache. 5. How to use Verify Function: This function is recommended when the source is local and fast to check (for instance a language file or a template file). When the source is distant (other cluster database, any rpc source on another network, integration of many parts, etc), it is more recommended to create a function that will delete the cache when needed (on demand cache change). The validator function is a func(id, time.Time) bool function. The first parameter is the ID entry in the cache, the second parameter the time of the entry was created. The validator function returns true is the cache is still valid, or false if it needs to be invalidated. The XCache is thread safe. The cache can be limited in quantity of entries and timeout for data. The cache is automanaged (for invalid expired data) and can be cleaned partially or totally manually. The XLanguage table of text entries can be loaded from XML file, XML string or normal text file or string. It is used to keep a table of id=value set of entries in any languages you need, so it is easy to switch between XLanguage instance based on the required language needed. Obviously, any XLanguage you load in any language should have the same id entries translated, for the same use. 1. loading: You can load any file or XML string directly into the object. 1.1 The XML Format is: NAMEOFTABLE is the name of your table entry, for example "loginform", "user_report", etc. LG is the ISO-3369 2 letters language ID, for example "es" for spanish, "en" for english, "fr" for french, etc. ENTRYNAME is the ID of the entry, for example "greating", "yourname", "submitbutton". ENTRYVALUE is the text for your entry, for example "Hello", "You are:", "Save" if your table is in english. 1.2 The flat text format is: ENTRYNAME is the ID of the entry, for example "greating", "yourname", "submitbutton". ENTRYVALUE is the text for your entry, for example "Hello", "You are:", "Save" if your table is in english. There is no name of table or language in this format (you "know" what you are loading). The advantage to use XML format is to have more control over your language, and eventyally add attributes into your entries, for instance you may add attributes translated="yes/no", verified="yes/no", and any other data that your system could insert. The XLanguage will ignore those attributes loading the table. 2. creation: To create a new XLanguage empty structure: There are 4 functions to create the language from a file or string, flat text or XML text: Then you can use the set of basic access functions: SetName/SetLanguage functions are used to set the table name and language of the object (generally to build an object from scratch). GetName/GetLanguage functions are used to get the table name and language of the object (generally when you load it from some source). Set/Get/Del functions are used to add or modify a new entry, read an entry, or deletes an entry in the object. 1. Overview: The XDataSet is a set of interfaces and basic classes ready-to-use to build a standard set of data optionally nested and hierarchical, that can be used for any purpose: - Keep complex data in memory. - Create JSON structures. - Inject data into templates. - Interchange database data (records set and record). You can store into it generic supported data, as well as any complex interface structures: - Int - Float - String - Time - Bool - []Int - []Float - []Time - []Bool - XDataSetDef (anything extended with this interface) - []String - Anything else ( interface{} ) - XDataSetCollectionDef (anything extended with this interface) The generic supported data comes with a set of functions to get/set those data directly into the XDataset. Example: Note that all references to XDataset and XDatasetCollection are pointers, always (to be able to modify the values of them). 2. XDatasetDef interface: It is the interface to describe a simple set of data mapped as "name": value, where value can be of any type. The interface implements a good amount of basic methods to get the value on various format such as GetString("name"), GetInt("name"), etc (see below). If the value is another type as asked, the method should contert it if possible. For instance "key":123 required through GetString("key") should return "123". The XDataset type is a simple map[string]interface{} with all the implemented methods and should be enough to use for almost all required cases. However, you can build any complex structure that extends the interface and implements all the required functions to stay compatible with the XDatasetDef. 3. XDatasetCollectionDef Interface: This is the interface used to extend any type of data as a Collection, i-e an array of XDatasetDef. This is a slice of any XDatasetDef compatible data. The interface implements some methods to work on array structure such as Push, Pop, Shift, Unshift and some methods to search data into the array. The XDatasetCollection type is a simple []DatasetDef with all the implemented methods and should be enough to use for almost all required cases. 1. Overview: This is a class to compile and keep a Template that can be injected with an XDataSet structure of data, with a metalanguage to inject the data. The metalanguage is extremely simple and is made to be useful and **really** separate programation from template code (not like other many generic template systems that just mix code and data). A template is a set of HTML/XML (or any other language) string with a meta language to inject variables and build a final string. The XCore XTemplate system is based on the injection of parameters, language translation strings and data fields directly into the HTML (Or any other language you need) template. The HTML itself (or any other language) is a text code not directly used by the template system, but used to dress the data you want to represent in your preferred language. The variables to inject must be into a XDataSet structure or into a structure extended from XDataSetDef interface. The injection of data is based on a XDataSet structure of values that can be nested into another XDataSet and XDataSetConnection and so on. The template compiler recognize nested arrays to automatically make loops on the information. Templates are made to store reusable HTML code, and overall easily changeable by people that do not know how to write programs. A template can be as simple as a single character (no variables to inject) to a very complex nested, conditional and loops sub-templates. Yes. this is a template, but a very simple one without need to inject any data. Let's go more complex: Having an array of data, we want to paint it beautifull: We can create a template to inject this data into it: 2. Create and use XTemplateData: In sight to create and use templates, you have all those possible options to use: Creates the XTemplate from a string or a file or any other source: 3. Metalanguage Reference: 3.1 Comments: %-- and --% You may use comments into your template. The comments will be discarded immediately at the compilation of the template and do not interfere with the rest of your code. Example: 3.2 Nested Templates: [[...]] and [[]] You can define new nested templates into your main template A nested template is defined by: The templteid is any combination of lowers letters only (a-z), numbers (0-9), and 3 special chars: . (point) - (dash) and _ (underline). The template is closed with [[]]. There is no limits into nesting templates. Any nested template will inheritate all the father elements and can use father elements too. To call a sub-template, you need to use &&templateid&& syntax (described below in this document). Example: You may use more than one id into the same template to avoid repetition of the same code. The different id's are separated with a pipe | Important note: A template will be visible only on the same level of its declaration. For example, if you put a subtemplate "b" into a subtemplate "a", it will not be visible by &&b&& from the top level, but only into the subtemplate "a". 3.3 Simple Elements: ##...## and {{...}} There are 2 types of simple elements. Language elements and Data injector elements (also called field elements). We "logically" define the 2 type of elements. The separation is only for human logic and template filling, however the language information can perfectly fit into the data to inject (and not use ## entries). 3.3.1 Languages elements: ##entry## All the languages elements should have the format: ##entry##. A language entry is generally anything written into your code or page that does not come from a database, and should adapt to the language of the client visiting your site. Using the languages elements may depend on the internationalization of your page. If your page is going to be in a single language forever, you really dont need to use languages entries. The language elements generally carry titles, menu options, tables headers etc. The language entries are set into the "#" entry of the main template XDataset to inject, and is a XLanguage table. Example: With data to inject: 3.3.2 Field elements: {{fieldname}} Fields values should have the format: {{fieldname}}. Your fields source can be a database or any other preferred repository data source. Example: You can access an element with its path into the data set to inject separating each field level with a > (greater than). This will take the name of the second hobby in the dataset defined upper. (collections are 0 indexed). The 1 denotes the second record of the hobbies XDatasetCollection. If the field is not found, it will be replaced with an empty string. Tecnically your field names can be any string in the dataset. However do not use { } or > into the names of your fields or the XTemplate may not use them correctly. We recommend to use lowercase names with numbers and ._- Accents and UTF8 symbols are also welcome. 3.3.3 Scope: When you use an id to point a value, the template will first search into the available ids of the local level. If no id is found, the it will search into the upper levers if any, and so on. Example: At the level of 'data2', using {{appname}} will get back 'DomCore'. At the level of 'key1', using {{appname}} will get back 'Nested App'. At the level of 'key2', using {{appname}} will get back 'DomCore'. At the level of root, 'data1' or 'detail', using {{appname}} will get back an empty string. 3.3.4 Path access: id>id>id>id At any level into the data array, you can access any entry into the subset array. For instance, taking the previous array of data to inject, let's suppose we are into a nested meta elements at the 'data1' level. You may want to access directly the 'Juan' entry. The path will be: The José's status value from the root will be: 3.4 Meta Elements They consist into an injection of a XDataset, called the "data to inject", into the template. The meta language is directly applied on the structure of the data array. The data to inject is a nested set of variables and values with the structure you want (there is no specific construction rules). You can inject nearly anything into a template meta elements. Example of a data array to inject: You can access directly any data into the array with its relative path (relative to the level you are when the metaelements are applied, see below). There are 4 structured meta elements in the XTemplate templates to use the data to inject: Reference, Loops, Condition and Debug. The structure of the meta elements in the template must follow the structure of the data to inject. 3.4.1 References to another template: &&order&& 3.4.1.1 When order is a single id (characters a-z0-9.-_), it will make a call to a sub template with the same set of data and replace the &&...&& with the result. The level in the data set is not changed. Example based on previous array of Fred's data: 3.4.1.2 When order contains 2 parameters separated by a semicolumn :, then second parameter is used to change the level of the data of array, with the subset with this id. The level in the data set is changed to this sub set. Example based on previous array of Fred's data: 3.4.1.3 When order contains 3 parameters separated by a semicolumn :, the second and third parameters are used to search the name of the new template based on the data fields to inject. This is an indirect access to the template. The name of the subtemplate is build with parameter3 as prefix and the content of parameter2 value. The third parameter must be empty. 3.4.2 Loops: @@order@@ 3.4.2.1 Overview This meta element will loop over each itterance of the set of data and concatenate each created template in the same order. You need to declare a sub template for this element. You may aso declare derivated sub templates for the different possible cases of the loop: For instance, If your main subtemplate for your look is called "hobby", you may need a different template for the first element, last element, Nth element, Element with a value "no" in the sport field, etc. The supported postfixes are: When the array to iterate is empty: - .none (for example "There is no hobby") When the array contains elements, it will search in order, the following template and use the first found: - templateid.key.[value] value is the key of the vector line. If the collection has a named key (string) or is a direct array (0, 1, 2...) - templateid.first if it is the first element of the array set (new from v1.01.11) - templateid.last if it is the first element of the array set (new from v1.01.11) - templateid.even if the line number is even - templateid in all other cases (odd is contained here if even is defined) 3.4.2.2 When order is a single id (characters a-z0-9.-_), it will make a call to the sub template id with the same subset of data with the same id and replace the @@...@@ for each itterance of the data with the result. Example based on previous array of Fred's data: 3.4.2.3 When order contains 2 parameters separated by a semicolumn :, then first parameter is used to change the level of the data of array, with the subset with this id, and the second one for the template to use. Example based on previous array of Fred's data: 3.4.3 Conditional: ??order?? Makes a call to a subtemplate only if the field exists and have a value. This is very userfull to call a sub template for instance when an image or a video is set. When the condition is not met, it will search for the [id].none template. The conditional element does not change the level in the data set. 3.4.3.1 When order is a single id (characters a-z0-9.-_), it will make a call to the sub template id with the same field in the data and replace the ??...?? with the corresponding template Example based on previous array of Fred's data: 3.4.3.2 When order contains 2 parameters separated by a semicolumn :, then second parameter is used to change the level of the data of array, with the subset with this id. Example based on previous array of Fred's data: If the asked field is a catalog, true/false, numbered, you may also use .[value] subtemplates 3.5 Debug Tools: !!order!! There are two keywords to dump the content of the data set. This is very useful when you dont know the code that calls the template, don't remember some values, or for debug facilities. 3.5.1 !!dump!! Will show the totality of the data set, with ids and values. 3.5.1 !!list!! Will show only the tree of parameters, values are not shown.
The rbxfile package handles the decoding, encoding, and manipulation of Roblox instance data structures. This package can be used to manipulate Roblox instance trees outside of the Roblox client. Such data structures begin with a Root struct. A Root contains a list of child Instances, which in turn contain more child Instances, and so on, forming a tree of Instances. These Instances can be accessed and manipulated using an API similar to that of Roblox. Each Instance also has a set of "properties". Each property has a specific value of a certain type. Every available type implements the Value interface, and is prefixed with "Value". Root structures can be decoded from and encoded to various formats, including Roblox's native file formats. The two sub-packages "bin" and "xml" provide formats for Roblox's binary and XML formats. Root structures can also be encoded and decoded with the "json" package. Besides decoding from a format, root structures can also be created manually. The best way to do this is through the "declare" sub-package, which provides an easy way to generate root structures.
Package suture provides Erlang-like supervisor trees. This implements Erlang-esque supervisor trees, as adapted for Go. This is an industrial-strength, tested library deployed into hostile environments, not just a proof of concept or a toy. Supervisor Tree -> SuTree -> suture -> holds your code together when it's trying to fall apart. Why use Suture? Suture has 100% test coverage, and is golint clean. This doesn't prove it free of bugs, but it shows I care. A blog post describing the design decisions is available at http://www.jerf.org/iri/post/2930 . To idiomatically use Suture, create a Supervisor which is your top level "application" supervisor. This will often occur in your program's "main" function. Create "Service"s, which implement the Service interface. .Add() them to your Supervisor. Supervisors are also services, so you can create a tree structure here, depending on the exact combination of restarts you want to create. As a special case, when adding Supervisors to Supervisors, the "sub" supervisor will have the "super" supervisor's Log function copied. This allows you to set one log function on the "top" supervisor, and have it propagate down to all the sub-supervisors. This also allows libraries or modules to provide Supervisors without having to commit their users to a particular logging method. Finally, as what is probably the last line of your main() function, call .Serve() on your top level supervisor. This will start all the services you've defined. See the Example for an example, using a simple service that serves out incrementing integers.
Package golux provides a lightweight HTTP router designed for serverless applications. golux allows you to define routes and handlers for incoming HTTP requests in AWS Lambda functions, using a simple and efficient routing mechanism based on a tree structure.
Package hashring implements consistent hashing hashring data structure. In general, consistent hashing is all about mapping of object from a very big set of values (e.g. request id) to object from a quite small set (e.g. server address). The word "consistent" means that it can produce consistent mapping on different machines or processes without additional state exchange and communication. For more theory about the subject please see this great document: https://theory.stanford.edu/~tim/s16/l/l1.pdf There are two goals for this hashring implementation: 1) To be efficient in highly concurrent applications by blocking read operations for the least possible time. 2) To correctly handle very rare but yet possible hash collisions, which may break all your eventually consistent application. To reach the first goal hashring uses immutable AVL tree internally, making read operations (getting item for object) blocked only for a tiny amount of time needed to swap the ring's tree root after some write operation (insertion or deletion). The second goal is reached by using ring of size 2^64-1 points, which dramatically reduces the probability of hash collisions (the greater the number of items on the ring, the higher the probability of collisions) and implementation that covers collisions.
Package gop is a project manangement tool for building your golang applications out of global GOPATH. In fact gop will keep both global GOPATH and every project GOPATH. But that means your project will not go-getable. Of course, GOP itself is go-getable. GOP copy all denpendencies from global GOPATH to your project's src/vendor directory and all application's sources are also in src directory. A normal process using gop is below: 1. GOPATH compatible 2. Multiple build targets support 3. Put your projects out of global GOPATH Please ensure you have installed the go command, GOP will invoke it on building or testing Every project should have a GOPATH directory structure and put a gop.yml int the root directory. This is an example project's directory tree. Gop will recognize a gop project which has gop.yml. The file is also a project configuration file. Below is an example. If you didn't define any target, the default target is src/main and the target name is the project name. 1. init Create the default directory structure tree. 2. ensure Automatically copy dependencies from $GOPATH to local project directory. -g will let you automatically call go get <package> when the package is missing on GOPATH. -u will always go get <package> on all the dependencies and copy them to vendor. 3. status List all dependencies of this project and show the status. 4. add Add one or more packages to this project. 5. update Update one or more packages to this project. All missing dependent packages will also be added. -f will update exists dependent packages. 6. rm Remove one or more packages from this project. 7. build Run go build on the src directory. If you want to execute ensure before build, you can use `-e` flag. 8. run Run go run on the src directory. -w will monitor the go source code changes and automatically build and run again. `-e` will automatically execute `ensure` before every time build. 9. test Run go test on the src directory. If you want to execute ensure before build, you can use `-e` flag. 10. release Run go release on the src directory.
Package Ki provides the base element of GoKi Trees: Ki = Tree in Japanese, and "Key" in English -- powerful tree structures supporting scenegraphs, programs, parsing, etc. The Node struct that implements the Ki interface, which can be used as an embedded type (or a struct field) in other structs to provide core tree functionality, including: Parent / Child Tree structure -- each Node can ONLY have one parent. Node struct's can also have Node fields -- these are functionally like fixed auto-named children. Paths for locating Nodes within the hierarchy -- key for many use-cases, including ability to convert pointers to/from strings for IO and robust deep copy and move functions. The path separator is / for children and . for fields. Apply a function across nodes up or down a tree (natural "me first", breadth-first, depth-first) -- very flexible for tree walking. Generalized I/O -- can Save and Load the Tree as JSON, XML, etc -- including pointers which are saved using paths and automatically cached-out after loading -- enums also bidirectionally convertable to strings using enum type registry in kit package. Robust deep copy, clone, move of nodes, with automatic pointer updating. Signal sending and receiving between Nodes (simlar to Qt Signals / Slots) -- setup connections once and then emit signals to all receivers when relevant event happens. Robust state updating -- wrap updates in UpdateStart / End, and signals are blocked until the final end, at the highest affected level in the tree, at which point a single update signal is sent -- automatically gives the minimal update. Properties (as a string-keyed map) with property inheritance, including type-level properties via kit type registry.
Package merkle A merkle tree is a kind of binary tree where each node is labelled with a hash. Starting from the very bottom, leaves will be paired and hashed together to make their parent inner-node, recursively up to the root a.k.a merkle root. Merkle trees, are commonly used in distributed systems to efficiently compare large data set ensuring validity of such data. Some examples that leverage merkle trees are the : - Git version control - AWS's QLDB - Apache's Cassandra - blockchains There are different flavours of implementations, this package doesn't attempt to build an abstraction for all the possible ones out there, it rather implements a fairly efficient specific one that can be used to experiment with the data structure and concepts. With that said, if you're looking to use this package to validate merkle proofs for existing blockchains you should look elsewhere as their implementation may be different. For example, Bitcoin's merkle, duplicates eventual odd nodes to re-balance the tree and this implementation doesn't, thus producing a different merkle root and proof.
GKES (Go Kafka Event Source) attempts to fill the gaps ub the Go/Kafka library ecosystem. It supplies Exactly Once Semantics (EOS), local state stores and incremental consumer rebalancing to Go Kafka consumers, making it a viable alternative to a traditional Kafka Streams application written in Java. GKES is Go/Kafka library tailored towards the development of Event Sourcing applications, by providing a high-throughput, low-latency Kafka client framework. Using Kafka transactions, it provides for EOS, data integrity and high availability. If you wish to use GKES as straight Kafka consumer, it will fit the bill as well. Though there are plenty of libraries for that, and researching which best fits your use case is time well spent. GKES is not an all-in-one, do-everything black box. Some elements, in particular the StateStore, have been left without comprehensive implementations. A useful and performant local state store rarely has a flat data structure. If your state store does, there are some convenient implementations provided. However, to achieve optimum performance, you will not only need to write a StateStore implementation, but will also need to understand what the proper data structures are for your use case (trees, heaps, maps, disk-based LSM trees or combinations thereof). You can use the provided github.com/aws/go-kafka-event-source/streams/stores.SimpleStore as a starting point. GKES purposefully does not provide a pre-canned way for exposing StateStore data, other than a producing to another Kafka topic. There are as many ways to vend data as there are web applications. Rather than putting effort into inventing yet another one, GKES provides the mechanisms to query StateStores via Interjections. This mechanism can be plugged into whatever request/response mechanism that suits your use-case (gRPC, RESTful HTTP service...any number of web frameworks already in the Go ecosystem). [TODO: provide a simple http example] For this familiar with thw Kafka Streams API, GKES provides for stream `Punctuators“, but we call them `Interjections` (because it sounds cool). Interjections allow you to insert actions into your EventSource at specicifed interval per partition assigned via streams.EventSource.ScheduleInterjection, or at any time via streams.EventSource.Interject. This is useful for bookeeping activities, aggregated metric production or even error handling. Interjections have full access to the StateStore associated with an EventSource and can interact with output topics like any other EventProcessor. One issue that Kafka conumer applications have long suffered from are latency spikes during a consumer rebalance. The cooperative sticky rebalancing introduced by Kafka and implemented by kgo helps resolve this issue. However, once StateStore are thrown into the mix, things get a bit more complicated because initializing the StateStore on a host invloves consuming a compacted TopicPartion from start to end. GKES solves this with the IncrementalRebalancer and takes it one step further. The IncrementalRebalancer rebalances consumer partitions in a controlled fashion, minimizing latency spikes and limiting the blast of a bad deployment. GKES provides conventions for asynchronously processing events on the same Kafka partition while still maintaining data/stream integrity. The AsyncBatcher and AsyncJobScheduler allow you to split a TopicPartition into sub-streams by key, ensuring all events for a partitcular key are processed in order, allowing for parallel processing on a given TopicPartition. For more details, see Async Processing Examples A Kafka transaction is a powerful tool which allows for Exactly Once Semantics (EOS) by linking a consumer offset commit to one or more records that are being produced by your application (a StateStore record for example). The history of Kafka EOS is a long and complicated one with varied degrees of performance and efficiency. Early iterations required one producer transaction per consumer partition, which was very ineffiecient as Topic with 1000 partitions would also require 1000 clients in order to provide EOS. This has since been addressed, but depending on client implementations, there is a high risk of running into "producer fenced" errors as well as reduced throughput. In a traditional Java Kafka Streams application, transactions are committed according to the auto-commit frequency, which defaults to 100ms. This means that your application will only produce readable records every 100ms per partition. The effect of this is that no matter what you do, your tail latency will be at least 100ms and downstream consumers will receive records in bursts rather than a steady stream. For many use cases, this is unaceptable. GKES solves this issue by using a configurable transactional producer pool and a type of "Nagle's algorithm". Uncommitted offsets are added to the transaction pool in sequence. Once a producer has reach its record limit, or enough time has elapsed (10ms by default), the head transaction will wait for any incomplete events to finsh, then flush and commit. While this transaction is committing, GKES continues to process events and optimistically begins a new transaction and produces records on the next producer in pool. Since trasnaction produce in sequence, there is no danger of commit offset overlap or duplicate message processing in the case of a failure. To ensure EOS, your EventSource must use either the IncrementalRebalancer, or kgos cooperative sticky implementation. Though if you're using a StateStore, IncrementalRebalancer should be used to avoid lengthy periods of inactivity during application deployments. Rather than create yet another Kafka driver, GKES is built on top of kgo. This Kafka client was chosen as it (in our testing) has superior throughput and latency profiles compared to other client libraries currently available to Go developers. One other key adavantage is that it provides a migration path to cooperative consumer rebalancing, required for our EOS implementation. Other Go Kafka libraries provide cooperative rebalancing, but do not allow you to migrate froma non-cooperative rebalancing strategy (range, sticky etc.). This is a major roadblock for existing deployemtns as the only migration paths are an entirely new consumer group, or to bring your application completely down and re-deploy with a new rebalance strategy. These migration plans, to put it mildly, are big challenge for zero-downtime/live applications. The kgo package now makes this migration possible with zero downtime. Kgo also has the proper hooks need to implement the IncrementalGroupRebalancer, which is necessary for safe deployments when using a local state store. Kudos to kgo!
Package CloudForest implements ensembles of decision trees for machine learning in pure Go (golang to search engines). It allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical/categorical data with missing values. These include: Breiman and Cutler's Random Forest for Classification and Regression Adaptive Boosting (AdaBoost) Classification Gradiant Boosting Tree Regression Entropy and Cost driven classification L1 regression Feature selection with artificial contrasts Proximity and model structure analysis Roughly balanced bagging for unbalanced classification The API hasn't stabilized yet and may change rapidly. Tests and benchmarks have been performed only on embargoed data sets and can not yet be released. Library Documentation is in code and can be viewed with godoc or live at: http://godoc.org/github.com/IlyaLab/CloudForest Documentation of command line utilities and file formats can be found in README.md, which can be viewed fromated on github: http://github.com/IlyaLab/CloudForest Pull requests and bug reports are welcome. CloudForest was created by Ryan Bressler and is being developed in the Shumelivich Lab at the Institute for Systems Biology for use on genomic/biomedical data with partial support from The Cancer Genome Atlas and the Inova Translational Medicine Institute. CloudForest is intended to provide fast, comprehensible building blocks that can be used to implement ensembles of decision trees. CloudForest is written in Go to allow a data scientist to develop and scale new models and analysis quickly instead of having to modify complex legacy code. Data structures and file formats are chosen with use in multi threaded and cluster environments in mind. Go's support for function types is used to provide a interface to run code as data is percolated through a tree. This method is flexible enough that it can extend the tree being analyzed. Growing a decision tree using Breiman and Cutler's method can be done in an anonymous function/closure passed to a tree's root node's Recurse method: This allows a researcher to include whatever additional analysis they need (importance scores, proximity etc) in tree growth. The same Recurse method can also be used to analyze existing forests to tabulate scores or extract structure. Utilities like leafcount and errorrate use this method to tabulate data about the tree in collection objects. Decision tree's are grown with the goal of reducing "Impurity" which is usually defined as Gini Impurity for categorical targets or mean squared error for numerical targets. CloudForest grows trees against the Target interface which allows for alternative definitions of impurity. CloudForest includes several alternative targets: Additional targets can be stacked on top of these target to add boosting functionality: Repeatedly splitting the data and searching for the best split at each node of a decision tree are the most computationally intensive parts of decision tree learning and CloudForest includes optimized code to perform these tasks. Go's slices are used extensively in CloudForest to make it simple to interact with optimized code. Many previous implementations of Random Forest have avoided reallocation by reordering data in place and keeping track of start and end indexes. In go, slices pointing at the same underlying arrays make this sort of optimization transparent. For example a function like: can return left and right slices that point to the same underlying array as the original slice of cases but these slices should not have their values changed. Functions used while searching for the best split also accepts pointers to reusable slices and structs to maximize speed by keeping memory allocations to a minimum. BestSplitAllocs contains pointers to these items and its use can be seen in functions like: For categorical predictors, BestSplit will also attempt to intelligently choose between 4 different implementations depending on user input and the number of categories. These include exhaustive, random, and iterative searches for the best combination of categories implemented with bitwise operations against int and big.Int. See BestCatSplit, BestCatSplitIter, BestCatSplitBig and BestCatSplitIterBig. All numerical predictors are handled by BestNumSplit which relies on go's sorting package. Training a Random forest is an inherently parallel process and CloudForest is designed to allow parallel implementations that can tackle large problems while keeping memory usage low by writing and using data structures directly to/from disk. Trees can be grown in separate go routines. The growforest utility provides an example of this that uses go routines and channels to grow trees in parallel and write trees to disk as the are finished by the "worker" go routines. The few summary statistics like mean impurity decrease per feature (importance) can be calculated using thread safe data structures like RunningMean. Trees can also be grown on separate machines. The .sf stochastic forest format allows several small forests to be combined by concatenation and the ForestReader and ForestWriter structs allow these forests to be accessed tree by tree (or even node by node) from disk. For data sets that are too big to fit in memory on a single machine Tree.Grow and FeatureMatrix.BestSplitter can be reimplemented to load candidate features from disk, distributed database etc. By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data: Missing values in the target variable are left out of impurity calculations. This provided generally good results at a fraction of the computational costs of imputing data. Optionally, feature.ImputeMissing or featurematrixImputeMissing can be called before forest growth to impute missing values to the feature mean/mode which Brieman [2] suggests as a fast method for imputing values. This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes. Experimental support is provided for 3 way splitting which splits missing cases onto a third branch. [2] This has so far yielded mixed results in testing. At some point in the future support may be added for local imputing of missing values during tree growth as described in [3] [1] http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1 [2] https://code.google.com/p/rf-ace/ [3] http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.aoas/1223908043&page=record In CloudForest data is stored using the FeatureMatrix struct which contains Features. The Feature struct implements storage and methods for both categorical and numerical data and calculations of impurity etc and the search for the best split. The Target interface abstracts the methods of Feature that are needed for a feature to be predictable. This allows for the implementation of alternative types of regression and classification. Trees are built from Nodes and Splitters and stored within a Forest. Tree has a Grow implements Brieman and Cutler's method (see extract above) for growing a tree. A GrowForest method is also provided that implements the rest of the method including sampling cases but it may be faster to grow the forest to disk as in the growforest utility. Prediction and Voting is done using Tree.Vote and CatBallotBox and NumBallotBox which implement the VoteTallyer interface.
Package CloudForest implements ensembles of decision trees for machine learning in pure Go (golang to search engines). It allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical/categorical data with missing values. These include: Breiman and Cutler's Random Forest for Classification and Regression Adaptive Boosting (AdaBoost) Classification Gradiant Boosting Tree Regression Entropy and Cost driven classification L1 regression Feature selection with artificial contrasts Proximity and model structure analysis Roughly balanced bagging for unbalanced classification The API hasn't stabilized yet and may change rapidly. Tests and benchmarks have been performed only on embargoed data sets and can not yet be released. Library Documentation is in code and can be viewed with godoc or live at: http://godoc.org/github.com/IlyaLab/CloudForest Documentation of command line utilities and file formats can be found in README.md, which can be viewed fromated on github: http://github.com/IlyaLab/CloudForest Pull requests and bug reports are welcome. CloudForest was created by Ryan Bressler and is being developed in the Shumelivich Lab at the Institute for Systems Biology for use on genomic/biomedical data with partial support from The Cancer Genome Atlas and the Inova Translational Medicine Institute. CloudForest is intended to provide fast, comprehensible building blocks that can be used to implement ensembles of decision trees. CloudForest is written in Go to allow a data scientist to develop and scale new models and analysis quickly instead of having to modify complex legacy code. Data structures and file formats are chosen with use in multi threaded and cluster environments in mind. Go's support for function types is used to provide a interface to run code as data is percolated through a tree. This method is flexible enough that it can extend the tree being analyzed. Growing a decision tree using Breiman and Cutler's method can be done in an anonymous function/closure passed to a tree's root node's Recurse method: This allows a researcher to include whatever additional analysis they need (importance scores, proximity etc) in tree growth. The same Recurse method can also be used to analyze existing forests to tabulate scores or extract structure. Utilities like leafcount and errorrate use this method to tabulate data about the tree in collection objects. Decision tree's are grown with the goal of reducing "Impurity" which is usually defined as Gini Impurity for categorical targets or mean squared error for numerical targets. CloudForest grows trees against the Target interface which allows for alternative definitions of impurity. CloudForest includes several alternative targets: Additional targets can be stacked on top of these target to add boosting functionality: Repeatedly splitting the data and searching for the best split at each node of a decision tree are the most computationally intensive parts of decision tree learning and CloudForest includes optimized code to perform these tasks. Go's slices are used extensively in CloudForest to make it simple to interact with optimized code. Many previous implementations of Random Forest have avoided reallocation by reordering data in place and keeping track of start and end indexes. In go, slices pointing at the same underlying arrays make this sort of optimization transparent. For example a function like: can return left and right slices that point to the same underlying array as the original slice of cases but these slices should not have their values changed. Functions used while searching for the best split also accepts pointers to reusable slices and structs to maximize speed by keeping memory allocations to a minimum. BestSplitAllocs contains pointers to these items and its use can be seen in functions like: For categorical predictors, BestSplit will also attempt to intelligently choose between 4 different implementations depending on user input and the number of categories. These include exhaustive, random, and iterative searches for the best combination of categories implemented with bitwise operations against int and big.Int. See BestCatSplit, BestCatSplitIter, BestCatSplitBig and BestCatSplitIterBig. All numerical predictors are handled by BestNumSplit which relies on go's sorting package. Training a Random forest is an inherently parallel process and CloudForest is designed to allow parallel implementations that can tackle large problems while keeping memory usage low by writing and using data structures directly to/from disk. Trees can be grown in separate go routines. The growforest utility provides an example of this that uses go routines and channels to grow trees in parallel and write trees to disk as the are finished by the "worker" go routines. The few summary statistics like mean impurity decrease per feature (importance) can be calculated using thread safe data structures like RunningMean. Trees can also be grown on separate machines. The .sf stochastic forest format allows several small forests to be combined by concatenation and the ForestReader and ForestWriter structs allow these forests to be accessed tree by tree (or even node by node) from disk. For data sets that are too big to fit in memory on a single machine Tree.Grow and FeatureMatrix.BestSplitter can be reimplemented to load candidate features from disk, distributed database etc. By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data: Missing values in the target variable are left out of impurity calculations. This provided generally good results at a fraction of the computational costs of imputing data. Optionally, feature.ImputeMissing or featurematrixImputeMissing can be called before forest growth to impute missing values to the feature mean/mode which Brieman [2] suggests as a fast method for imputing values. This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes. Experimental support is provided for 3 way splitting which splits missing cases onto a third branch. [2] This has so far yielded mixed results in testing. At some point in the future support may be added for local imputing of missing values during tree growth as described in [3] [1] http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1 [2] https://code.google.com/p/rf-ace/ [3] http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.aoas/1223908043&page=record In CloudForest data is stored using the FeatureMatrix struct which contains Features. The Feature struct implements storage and methods for both categorical and numerical data and calculations of impurity etc and the search for the best split. The Target interface abstracts the methods of Feature that are needed for a feature to be predictable. This allows for the implementation of alternative types of regression and classification. Trees are built from Nodes and Splitters and stored within a Forest. Tree has a Grow implements Brieman and Cutler's method (see extract above) for growing a tree. A GrowForest method is also provided that implements the rest of the method including sampling cases but it may be faster to grow the forest to disk as in the growforest utility. Prediction and Voting is done using Tree.Vote and CatBallotBox and NumBallotBox which implement the VoteTallyer interface.
Package tree implements a tree structure.
This package is the root package of the govmomi library. The library is structured as follows: The minimal usable functionality is available through the vim25 package. It contains subpackages that contain generated types, managed objects, and all available methods. The vim25 package is entirely independent of the other packages in the govmomi tree -- it has no dependencies on its peers. The vim25 package itself contains a client structure that is passed around throughout the entire library. It abstracts a session and its immutable state. See the vim25 package for more information. The session package contains an abstraction for the session manager that allows a user to login and logout. It also provides access to the current session (i.e. to determine if the user is in fact logged in) The object package contains wrappers for a selection of managed objects. The constructors of these objects all take a *vim25.Client, which they pass along to derived objects, if applicable. The govc package contains the govc CLI. The code in this tree is not intended to be used as a library. Any functionality that govc contains that _could_ be used as a library function but isn't, _should_ live in a root level package. Other packages, such as "event", "guest", or "license", provide wrappers for the respective subsystems. They are typically not needed in normal workflows so are kept outside the object package.
fxml - FreeStyle XML Parser This package provides a simple parser which reads a XML document and output a tree structure, which does not need a pre-defined “struct”, hence the name “FreeStyle”.
Package parsing provides basic search and comparison of HTML documents. To limit storage of references, it uses the net/html package and its Node type to structure HTML. Search a tag in a Node with options Three ways to print a node tree Good to know
Package mmheap provides a drop-in min-max heap for any type that implements heap.Interface. In a min-max heap, the minimum element is at the root at index 0, and the maximum element is in one of the two root children. Thus, a min-max heap provides an efficient way to access either the minimum or maximum elements in a set in O(1) time. Due to the increased number of comparisons to implement a min-heap, its runtime is slower than a general heap (in this implementation, just under 2x). Because of this, unless you need to continually access both the minimum and maximum elements, it may be beneficial to use a different data structure. Even if you do need to continually access the minimum or maximum, a different data structure may be better. This package exists for anybody looking, but I recommend benchmarking this against github.com/google/btree. Generally, a btree is a good and fast data structure for quickly accessing min and max elements. However, a key benefit of heap or mmheap is the ability to change or modify elements and fix their position in the tree. For more information about a min-max heap, read the following paper: Since this package is identical to the stdlib's container/heap documentation, it is elided here.
Package Ki provides the base element of GoKi Trees: Ki = Tree in Japanese, and "Key" in English -- powerful tree structures supporting scenegraphs, programs, parsing, etc. The Node struct that implements the Ki interface, which can be used as an embedded type (or a struct field) in other structs to provide core tree functionality, including: Parent / Child Tree structure -- each Node can ONLY have one parent. Node struct's can also have Node fields -- these are functionally like fixed auto-named children. Paths for locating Nodes within the hierarchy -- key for many use-cases, including ability to convert pointers to/from strings for IO and robust deep copy and move functions. The path separator is / for children and . for fields. Apply a function across nodes up or down a tree (natural "me first", breadth-first, depth-first) -- very flexible for tree walking. Generalized I/O -- can Save and Load the Tree as JSON, XML, etc -- including pointers which are saved using paths and automatically cached-out after loading -- enums also bidirectionally convertable to strings using enum type registry in kit package. Robust deep copy, clone, move of nodes, with automatic pointer updating. Signal sending and receiving between Nodes (simlar to Qt Signals / Slots) -- setup connections once and then emit signals to all receivers when relevant event happens. Robust state updating -- wrap updates in UpdateStart / End, and signals are blocked until the final end, at the highest affected level in the tree, at which point a single update signal is sent -- automatically gives the minimal update. Properties (as a string-keyed map) with property inheritance, including type-level properties via kit type registry.
Package btree implements in-memory B-Trees of arbitrary degree. btree implements an in-memory B-Tree for use as an ordered data structure. It is not meant for persistent storage solutions. It has a flatter structure than an equivalent red-black or other binary tree, which in some cases yields better memory usage and/or performance. See some discussion on the matter here: Note, though, that this project is in no way related to the C++ B-Tree implementation written about there. Within this tree, each node contains a slice of items and a (possibly nil) slice of children. For basic numeric values or raw structs, this can cause efficiency differences when compared to equivalent C++ template code that stores values in arrays within the node: These issues don't tend to matter, though, when working with strings or other heap-allocated structures, since C++-equivalent structures also must store pointers and also distribute their values across the heap. This implementation is designed to be a drop-in replacement to gollrb.LLRB trees, (http://github.com/petar/gollrb), an excellent and probably the most widely used ordered tree implementation in the Go ecosystem currently. Its functions, therefore, exactly mirror those of llrb.LLRB where possible. Unlike gollrb, though, we currently don't support storing multiple equivalent values.
Package CloudForest implements ensembles of decision trees for machine learning in pure Go (golang to search engines). It allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical/categorical data with missing values. These include: Breiman and Cutler's Random Forest for Classification and Regression Adaptive Boosting (AdaBoost) Classification Gradiant Boosting Tree Regression Entropy and Cost driven classification L1 regression Feature selection with artificial contrasts Proximity and model structure analysis Roughly balanced bagging for unbalanced classification The API hasn't stabilized yet and may change rapidly. Tests and benchmarks have been performed only on embargoed data sets and can not yet be released. Library Documentation is in code and can be viewed with godoc or live at: http://godoc.org/github.com/ryanbressler/CloudForest Documentation of command line utilities and file formats can be found in README.md, which can be viewed fromated on github: http://github.com/ryanbressler/CloudForest Pull requests and bug reports are welcome. CloudForest was created by Ryan Bressler and is being developed in the Shumelivich Lab at the Institute for Systems Biology for use on genomic/biomedical data with partial support from The Cancer Genome Atlas and the Inova Translational Medicine Institute. CloudForest is intended to provide fast, comprehensible building blocks that can be used to implement ensembles of decision trees. CloudForest is written in Go to allow a data scientist to develop and scale new models and analysis quickly instead of having to modify complex legacy code. Data structures and file formats are chosen with use in multi threaded and cluster environments in mind. Go's support for function types is used to provide a interface to run code as data is percolated through a tree. This method is flexible enough that it can extend the tree being analyzed. Growing a decision tree using Breiman and Cutler's method can be done in an anonymous function/closure passed to a tree's root node's Recurse method: This allows a researcher to include whatever additional analysis they need (importance scores, proximity etc) in tree growth. The same Recurse method can also be used to analyze existing forests to tabulate scores or extract structure. Utilities like leafcount and errorrate use this method to tabulate data about the tree in collection objects. Decision tree's are grown with the goal of reducing "Impurity" which is usually defined as Gini Impurity for categorical targets or mean squared error for numerical targets. CloudForest grows trees against the Target interface which allows for alternative definitions of impurity. CloudForest includes several alternative targets: Additional targets can be stacked on top of these target to add boosting functionality: Repeatedly splitting the data and searching for the best split at each node of a decision tree are the most computationally intensive parts of decision tree learning and CloudForest includes optimized code to perform these tasks. Go's slices are used extensively in CloudForest to make it simple to interact with optimized code. Many previous implementations of Random Forest have avoided reallocation by reordering data in place and keeping track of start and end indexes. In go, slices pointing at the same underlying arrays make this sort of optimization transparent. For example a function like: can return left and right slices that point to the same underlying array as the original slice of cases but these slices should not have their values changed. Functions used while searching for the best split also accepts pointers to reusable slices and structs to maximize speed by keeping memory allocations to a minimum. BestSplitAllocs contains pointers to these items and its use can be seen in functions like: For categorical predictors, BestSplit will also attempt to intelligently choose between 4 different implementations depending on user input and the number of categories. These include exhaustive, random, and iterative searches for the best combination of categories implemented with bitwise operations against int and big.Int. See BestCatSplit, BestCatSplitIter, BestCatSplitBig and BestCatSplitIterBig. All numerical predictors are handled by BestNumSplit which relies on go's sorting package. Training a Random forest is an inherently parallel process and CloudForest is designed to allow parallel implementations that can tackle large problems while keeping memory usage low by writing and using data structures directly to/from disk. Trees can be grown in separate go routines. The growforest utility provides an example of this that uses go routines and channels to grow trees in parallel and write trees to disk as the are finished by the "worker" go routines. The few summary statistics like mean impurity decrease per feature (importance) can be calculated using thread safe data structures like RunningMean. Trees can also be grown on separate machines. The .sf stochastic forest format allows several small forests to be combined by concatenation and the ForestReader and ForestWriter structs allow these forests to be accessed tree by tree (or even node by node) from disk. For data sets that are too big to fit in memory on a single machine Tree.Grow and FeatureMatrix.BestSplitter can be reimplemented to load candidate features from disk, distributed database etc. By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data: Missing values in the target variable are left out of impurity calculations. This provided generally good results at a fraction of the computational costs of imputing data. Optionally, feature.ImputeMissing or featurematrixImputeMissing can be called before forest growth to impute missing values to the feature mean/mode which Brieman [2] suggests as a fast method for imputing values. This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes. Experimental support is provided for 3 way splitting which splits missing cases onto a third branch. [2] This has so far yielded mixed results in testing. At some point in the future support may be added for local imputing of missing values during tree growth as described in [3] [1] http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1 [2] https://code.google.com/p/rf-ace/ [3] http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.aoas/1223908043&page=record In CloudForest data is stored using the FeatureMatrix struct which contains Features. The Feature struct implements storage and methods for both categorical and numerical data and calculations of impurity etc and the search for the best split. The Target interface abstracts the methods of Feature that are needed for a feature to be predictable. This allows for the implementation of alternative types of regression and classification. Trees are built from Nodes and Splitters and stored within a Forest. Tree has a Grow implements Brieman and Cutler's method (see extract above) for growing a tree. A GrowForest method is also provided that implements the rest of the method including sampling cases but it may be faster to grow the forest to disk as in the growforest utility. Prediction and Voting is done using Tree.Vote and CatBallotBox and NumBallotBox which implement the VoteTallyer interface.
Package tree provides a data structure of a mutable tree with constant leaves and version counters.
Package flatpack implements a mechanism to convert arbitrary Go data, possibly including unexported fields and shared and/or cyclic substructure, into and back from a purely tree-structured datastructure without unexported fields that can be serialised using encoding.json or the like. There are two parts to this mechanism: the Flatpack type, the serialization-friendly container which stores the converted data (and whose Pack and Unpack methods do the necessary conversion), and an package-internal type registry used to locate the correct type when deserializing and unpacking Flatpacks. This type registry must be pre-populated with all the (named) types that might be encountered; it will be convenient to do so by calling RegisterType and/or RegisterTypeOf from an init() func in each package whose types will be serialized.
Package suture provides Erlang-like supervisor trees. This implements Erlang-esque supervisor trees, as adapted for Go. This is an industrial-strength, tested library deployed into hostile environments, not just a proof of concept or a toy. If you are reading this, you are reading the documentation for the v3 version, which is not the latest. If you want the latest v4, be sure to be using github.com/thejerf/suture/v4. This rewrites the API to be in terms of contexts. Supervisor Tree -> SuTree -> suture -> holds your code together when it's trying to fall apart. Why use Suture? Suture has 100% test coverage, and is golint clean. This doesn't prove it free of bugs, but it shows I care. A blog post describing the design decisions is available at http://www.jerf.org/iri/post/2930 . To idiomatically use Suture, create a Supervisor which is your top level "application" supervisor. This will often occur in your program's "main" function. Create "Service"s, which implement the Service interface. .Add() them to your Supervisor. Supervisors are also services, so you can create a tree structure here, depending on the exact combination of restarts you want to create. As a special case, when adding Supervisors to Supervisors, the "sub" supervisor will have the "super" supervisor's Log function copied. This allows you to set one log function on the "top" supervisor, and have it propagate down to all the sub-supervisors. This also allows libraries or modules to provide Supervisors without having to commit their users to a particular logging method. Finally, as what is probably the last line of your main() function, call .Serve() on your top level supervisor. This will start all the services you've defined. See the Example for an example, using a simple service that serves out incrementing integers.
Package rufs contains the main API to Rufs. Rufs is organized as a collection of branches. Each branch represents a physical file system structure which can be queried and updated by an authorized client. On the client side one or several branches are organized into a tree. The single branches can overlay each other. For example: Branch A /foo/A /foo/B /bar/C Branch B /foo/C /test/D Tree 1 /myspace => Branch A, Branch B Accessing tree with: /myspace/foo/A gets file /foo/A from Branch A while /myspace/foo/C gets file /foo/C from Branch B Write operations go only to branches which are mapped as writing branches and who accept them (i.e. are not set to readonly on the side of the branch).