This package is the root package of the govmomi library. The library is structured as follows: The minimal usable functionality is available through the vim25 package. It contains subpackages that contain generated types, managed objects, and all available methods. The vim25 package is entirely independent of the other packages in the govmomi tree -- it has no dependencies on its peers. The vim25 package itself contains a client structure that is passed around throughout the entire library. It abstracts a session and its immutable state. See the vim25 package for more information. The session package contains an abstraction for the session manager that allows a user to login and logout. It also provides access to the current session (i.e. to determine if the user is in fact logged in) The object package contains wrappers for a selection of managed objects. The constructors of these objects all take a *vim25.Client, which they pass along to derived objects, if applicable. The govc package contains the govc CLI. The code in this tree is not intended to be used as a library. Any functionality that govc contains that _could_ be used as a library function but isn't, _should_ live in a root level package. Other packages, such as "event", "guest", or "license", provide wrappers for the respective subsystems. They are typically not needed in normal workflows so are kept outside the object package.
Package btree implements in-memory B-Trees of arbitrary degree. btree implements an in-memory B-Tree for use as an ordered data structure. It is not meant for persistent storage solutions. It has a flatter structure than an equivalent red-black or other binary tree, which in some cases yields better memory usage and/or performance. See some discussion on the matter here: Note, though, that this project is in no way related to the C++ B-Tree implementation written about there. Within this tree, each node contains a slice of items and a (possibly nil) slice of children. For basic numeric values or raw structs, this can cause efficiency differences when compared to equivalent C++ template code that stores values in arrays within the node: These issues don't tend to matter, though, when working with strings or other heap-allocated structures, since C++-equivalent structures also must store pointers and also distribute their values across the heap. This implementation is designed to be a drop-in replacement to gollrb.LLRB trees, (http://github.com/petar/gollrb), an excellent and probably the most widely used ordered tree implementation in the Go ecosystem currently. Its functions, therefore, exactly mirror those of llrb.LLRB where possible. Unlike gollrb, though, we currently don't support storing multiple equivalent values. There are two implementations; those suffixed with 'G' are generics, usable for any type, and require a passed-in "less" function to define their ordering. Those without this prefix are specific to the 'Item' interface, and use its 'Less' function for ordering.
Package sops manages JSON, YAML and BINARY documents to be encrypted or decrypted. This package should not be used directly. Instead, Sops users should install the command line client via `go get -u go.mozilla.org/sops/v3/cmd/sops`, or use the decryption helper provided at `go.mozilla.org/sops/v3/decrypt`. We do not guarantee API stability for any package other than `go.mozilla.org/sops/v3/decrypt`. A Sops document is a Tree composed of a data branch with arbitrary key/value pairs and a metadata branch with encryption and integrity information. In JSON and YAML formats, the structure of the cleartext tree is preserved, keys are stored in cleartext and only values are encrypted. Keeping the values in cleartext provides better readability when storing Sops documents in version controls, and allows for merging competing changes on documents. This is a major difference between Sops and other encryption tools that store documents as encrypted blobs. In BINARY format, the cleartext data is treated as a single blob and the encrypted document is in JSON format with a single `data` key and a single encrypted value. Sops allows operators to encrypt their documents with multiple master keys. Each of the master key defined in the document is able to decrypt it, allowing users to share documents amongst themselves without sharing keys, or using a PGP key as a backup for KMS. In practice, this is achieved by generating a data key for each document that is used to encrypt all values, and encrypting the data with each master key defined. Being able to decrypt the data key gives access to the document. The integrity of each document is guaranteed by calculating a Message Authentication Code (MAC) that is stored encrypted by the data key. When decrypting a document, the MAC should be recalculated and compared with the MAC stored in the document to verify that no fraudulent changes have been applied. The MAC covers keys and values as well as their ordering.
Package toml is a TOML parser and manipulation library. This version supports the specification as described in https://github.com/toml-lang/toml/blob/master/versions/en/toml-v0.5.0.md Go-toml can marshal and unmarshal TOML documents from and to data structures. Go-toml can operate on a TOML document as a tree. Use one of the Load* functions to parse TOML data and obtain a Tree instance, then one of its methods to manipulate the tree. The package github.com/pelletier/go-toml/query implements a system similar to JSONPath to quickly retrieve elements of a TOML document using a single expression. See the package documentation for more information. Package civil implements types for civil time, a time-zone-independent representation of time that follows the rules of the proleptic Gregorian calendar with exactly 24-hour days, 60-minute hours, and 60-second minutes. Because they lack location information, these types do not represent unique moments or intervals of time. Use time.Time for that purpose.
Package appdash provides a Go app performance tracing suite. Appdash allows you to trace the end-to-end performance of hierarchically structured applications. You can, for example, measure the time and see the detailed information of each HTTP request and SQL query made by an entire distributed web application. The cmd/appdash tool launches a web front-end which displays a web UI for viewing collected app traces. It is effectively a remote collector which your application can connect and send events to. Timing and application-specific metadata information can be viewed in a nice timeline view for each span (e.g. HTTP request) and it's children. The web front-end can also be embedded in your own Go HTTP server by utilizing the traceapp sub-package, which is effectively what cmd/appdash serves internally. Sub-packages for HTTP and SQL event tracing are provided for use with appdash, which allows it to function equivalently to Google's Dapper and Twitter's Zipkin performance tracing suites. The most high-level structure is a Trace, which represents the performance of an application from start to finish (in an HTTP application, for example, the loading of a web page). A Trace is a tree structure that is made up of several spans, which are just IDs (in an HTTP application, these ID's are passed through the stack via a few special headers). Each span ID has a set of Events that directly correspond to it inside a Collector. These events can be any combination of message, log, time-span, or time-stamped events (the cmd/appdash web UI displays these events as appropriate). Inside your application, a Recorder is used to send events to a Collector, which can be a remote HTTP(S) collector, a local in-memory or persistent collector, etc. Additionally, you can implement the Collector interface yourself and store events however you like.
Implementation of an R-Way Trie data structure. A Trie has a root Node which is the base of the tree. Each subsequent Node has a letter and children, which are nodes that have letter values associated with them.
Package gographviz provides parsing for the DOT grammar into an abstract syntax tree representing a graph, analysis of the abstract syntax tree into a more usable structure, and writing back of this structure into the DOT format.
Package suture provides Erlang-like supervisor trees. This implements Erlang-esque supervisor trees, as adapted for Go. This is an industrial-strength, tested library deployed into hostile environments, not just a proof of concept or a toy. Supervisor Tree -> SuTree -> suture -> holds your code together when it's trying to fall apart. Why use Suture? Suture has 100% test coverage, and is golint clean. This doesn't prove it free of bugs, but it shows I care. A blog post describing the design decisions is available at http://www.jerf.org/iri/post/2930 . To idiomatically use Suture, create a Supervisor which is your top level "application" supervisor. This will often occur in your program's "main" function. Create "Service"s, which implement the Service interface. .Add() them to your Supervisor. Supervisors are also services, so you can create a tree structure here, depending on the exact combination of restarts you want to create. As a special case, when adding Supervisors to Supervisors, the "sub" supervisor will have the "super" supervisor's Log function copied. This allows you to set one log function on the "top" supervisor, and have it propagate down to all the sub-supervisors. This also allows libraries or modules to provide Supervisors without having to commit their users to a particular logging method. Finally, as what is probably the last line of your main() function, call .Serve() on your top level supervisor. This will start all the services you've defined. See the Example for an example, using a simple service that serves out incrementing integers.
Package suture provides Erlang-like supervisor trees. This implements Erlang-esque supervisor trees, as adapted for Go. This is an industrial-strength, tested library deployed into hostile environments, not just a proof of concept or a toy. If you are reading this, you are reading the documentation for the v3 version, which is not the latest. If you want the latest v4, be sure to be using github.com/thejerf/suture/v4. This rewrites the API to be in terms of contexts. Supervisor Tree -> SuTree -> suture -> holds your code together when it's trying to fall apart. Why use Suture? Suture has 100% test coverage, and is golint clean. This doesn't prove it free of bugs, but it shows I care. A blog post describing the design decisions is available at http://www.jerf.org/iri/post/2930 . To idiomatically use Suture, create a Supervisor which is your top level "application" supervisor. This will often occur in your program's "main" function. Create "Service"s, which implement the Service interface. .Add() them to your Supervisor. Supervisors are also services, so you can create a tree structure here, depending on the exact combination of restarts you want to create. As a special case, when adding Supervisors to Supervisors, the "sub" supervisor will have the "super" supervisor's Log function copied. This allows you to set one log function on the "top" supervisor, and have it propagate down to all the sub-supervisors. This also allows libraries or modules to provide Supervisors without having to commit their users to a particular logging method. Finally, as what is probably the last line of your main() function, call .Serve() on your top level supervisor. This will start all the services you've defined. See the Example for an example, using a simple service that serves out incrementing integers.
Package merkletree implements a Merkle Tree capable of storing arbitrary content. A Merkle Tree is a hash tree that provides an efficient way to verify the contents of a set data are present and untampered with. At its core, a Merkle Tree is a list of items representing the data that should be verified. Each of these items is inserted into a leaf node and a tree of hashes is constructed bottom up using a hash of the nodes left and right children's hashes. This means that the root node will effictively be a hash of all other nodes (hashes) in the tree. This property allows the tree to be reproduced and thus verified by on the hash of the root node of the tree. The benefit of the tree structure is verifying any single content entry in the tree will require only nlog2(n) steps in the worst case. Creating a new merkletree requires that the type that the tree will be constructed from implements the Content interface. A slice of the Content items should be created and then passed to the NewTree method. t represents the Merkle Tree and can be verified and manipulated with the API methods described below.
Package skipper provides an HTTP routing library with flexible configuration as well as a runtime update of the routing rules. Skipper works as an HTTP reverse proxy that is responsible for mapping incoming requests to multiple HTTP backend services, based on routes that are selected by the request attributes. At the same time, both the requests and the responses can be augmented by a filter chain that is specifically defined for each route. Optionally, it can provide circuit breaker mechanism individually for each backend host. Skipper can load and update the route definitions from multiple data sources without being restarted. It provides a default executable command with a few built-in filters, however, its primary use case is to be extended with custom filters, predicates or data sources. For further information read 'Extending Skipper'. Skipper took the core design and inspiration from Vulcand: https://github.com/mailgun/vulcand. Skipper is 'go get' compatible. If needed, create a 'go workspace' first: Get the Skipper packages: Create a file with a route: Optionally, verify the syntax of the file: Start Skipper and make an HTTP request: The core of Skipper's request processing is implemented by a reverse proxy in the 'proxy' package. The proxy receives the incoming request, forwards it to the routing engine in order to receive the most specific matching route. When a route matches, the request is forwarded to all filters defined by it. The filters can modify the request or execute any kind of program logic. Once the request has been processed by all the filters, it is forwarded to the backend endpoint of the route. The response from the backend goes once again through all the filters in reverse order. Finally, it is mapped as the response of the original incoming request. Besides the default proxying mechanism, it is possible to define routes without a real network backend endpoint. One of these cases is called a 'shunt' backend, in which case one of the filters needs to handle the request providing its own response (e.g. the 'static' filter). Actually, filters themselves can instruct the request flow to shunt by calling the Serve(*http.Response) method of the filter context. Another case of a route without a network backend is the 'loopback'. A loopback route can be used to match a request, modified by filters, against the lookup tree with different conditions and then execute a different route. One example scenario can be to use a single route as an entry point to execute some calculation to get an A/B testing decision and then matching the updated request metadata for the actual destination route. This way the calculation can be executed for only those requests that don't contain information about a previously calculated decision. For further details, see the 'proxy' and 'filters' package documentation. Finding a request's route happens by matching the request attributes to the conditions in the route's definitions. Such definitions may have the following conditions: - method - path (optionally with wildcards) - path regular expressions - host regular expressions - headers - header regular expressions It is also possible to create custom predicates with any other matching criteria. The relation between the conditions in a route definition is 'and', meaning, that a request must fulfill each condition to match a route. For further details, see the 'routing' package documentation. Filters are applied in order of definition to the request and in reverse order to the response. They are used to modify request and response attributes, such as headers, or execute background tasks, like logging. Some filters may handle the requests without proxying them to service backends. Filters, depending on their implementation, may accept/require parameters, that are set specifically to the route. For further details, see the 'filters' package documentation. Each route has one of the following backends: HTTP endpoint, shunt, loopback or dynamic. Backend endpoints can be any HTTP service. They are specified by their network address, including the protocol scheme, the domain name or the IP address, and optionally the port number: e.g. "https://www.example.org:4242". (The path and query are sent from the original request, or set by filters.) A shunt route means that Skipper handles the request alone and doesn't make requests to a backend service. In this case, it is the responsibility of one of the filters to generate the response. A loopback route executes the routing mechanism on current state of the request from the start, including the route lookup. This way it serves as a form of an internal redirect. A dynamic route means that the final target will be defined in a filter. One of the filters in the chain must set the target backend url explicitly. Route definitions consist of the following: - request matching conditions (predicates) - filter chain (optional) - backend The eskip package implements the in-memory and text representations of route definitions, including a parser. (Note to contributors: in order to stay compatible with 'go get', the generated part of the parser is stored in the repository. When changing the grammar, 'go generate' needs to be executed explicitly to update the parser.) For further details, see the 'eskip' package documentation Skipper has filter implementations of basic auth and OAuth2. It can be integrated with tokeninfo based OAuth2 providers. For details, see: https://godoc.org/github.com/zalando/skipper/filters/auth. Skipper's route definitions of Skipper are loaded from one or more data sources. It can receive incremental updates from those data sources at runtime. It provides three different data clients: - Kubernetes: Skipper can be used as part of a Kubernetes Ingress Controller implementation together with https://github.com/zalando-incubator/kube-ingress-aws-controller . In this scenario, Skipper uses the Kubernetes API's Ingress extensions as a source for routing. For a complete deployment example, see more details in: https://github.com/zalando-incubator/kubernetes-on-aws/ . - Innkeeper: the Innkeeper service implements a storage for large sets of Skipper routes, with an HTTP+JSON API, OAuth2 authentication and role management. See the 'innkeeper' package and https://github.com/zalando/innkeeper. - etcd: Skipper can load routes and receive updates from etcd clusters (https://github.com/coreos/etcd). See the 'etcd' package. - static file: package eskipfile implements a simple data client, which can load route definitions from a static file in eskip format. Currently, it loads the routes on startup. It doesn't support runtime updates. Skipper can use additional data sources, provided by extensions. Sources must implement the DataClient interface in the routing package. Skipper provides circuit breakers, configured either globally, based on backend hosts or based on individual routes. It supports two types of circuit breaker behavior: open on N consecutive failures, or open on N failures out of M requests. For details, see: https://godoc.org/github.com/zalando/skipper/circuit. Skipper can be started with the default executable command 'skipper', or as a library built into an application. The easiest way to start Skipper as a library is to execute the 'Run' function of the current, root package. Each option accepted by the 'Run' function is wired in the default executable as well, as a command line flag. E.g. EtcdUrls becomes -etcd-urls as a comma separated list. For command line help, enter: An additional utility, eskip, can be used to verify, print, update and delete routes from/to files or etcd (Innkeeper on the roadmap). See the cmd/eskip command package, and/or enter in the command line: Skipper doesn't use dynamically loaded plugins, however, it can be used as a library, and it can be extended with custom predicates, filters and/or custom data sources. To create a custom predicate, one needs to implement the PredicateSpec interface in the routing package. Instances of the PredicateSpec are used internally by the routing package to create the actual Predicate objects as referenced in eskip routes, with concrete arguments. Example, randompredicate.go: In the above example, a custom predicate is created, that can be referenced in eskip definitions with the name 'Random': To create a custom filter we need to implement the Spec interface of the filters package. 'Spec' is the specification of a filter, and it is used to create concrete filter instances, while the raw route definitions are processed. Example, hellofilter.go: The above example creates a filter specification, and in the routes where they are included, the filter instances will set the 'X-Hello' header for each and every response. The name of the filter is 'hello', and in a route definition it is referenced as: The easiest way to create a custom Skipper variant is to implement the required filters (as in the example above) by importing the Skipper package, and starting it with the 'Run' command. Example, hello.go: A file containing the routes, routes.eskip: Start the custom router: The 'Run' function in the root Skipper package starts its own listener but it doesn't provide the best composability. The proxy package, however, provides a standard http.Handler, so it is possible to use it in a more complex solution as a building block for routing. Skipper provides detailed logging of failures, and access logs in Apache log format. Skipper also collects detailed performance metrics, and exposes them on a separate listener endpoint for pulling snapshots. For details, see the 'logging' and 'metrics' packages documentation. The router's performance depends on the environment and on the used filters. Under ideal circumstances, and without filters, the biggest time factor is the route lookup. Skipper is able to scale to thousands of routes with logarithmic performance degradation. However, this comes at the cost of increased memory consumption, due to storing the whole lookup tree in a single structure. Benchmarks for the tree lookup can be run by: In case more aggressive scale is needed, it is possible to setup Skipper in a cascade model, with multiple Skipper instances for specific route segments.
Package kdtree implements a k-d tree data structure.
Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin. HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID). The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive. Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order. This HAMT implementation: • Fixes the `bucketSize` to 3. • Defaults the `bitWidth` to 8, however within Filecoin it uses 5 • Defaults the hash algorithm to the 64-bit variant of Murmur3-x64 The algorithm used here is identical to that of the IPLD HashMap algorithm specified at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The specific parameters used by Filecoin and the DAG-CBOR block layout differ from the specification and are defined at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.
Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin. HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID). The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive. Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order. This HAMT implementation: • Fixes the `bucketSize` to 3. • Defaults the `bitWidth` to 8, however within Filecoin it uses 5 • Defaults the hash algorithm to the 64-bit variant of Murmur3-x64 The algorithm used here is identical to that of the IPLD HashMap algorithm specified at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The specific parameters used by Filecoin and the DAG-CBOR block layout differ from the specification and are defined at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.
Package encoding provides a few of the encoding structures that are missing from the Go x/text/encoding tree.
Package gcnotifier provides a way to receive notifications after every garbage collection (GC) cycle. This can be useful, in long-running programs, to instruct your code to free additional memory resources that you may be using. A common use case for this is when you have custom data structures (e.g. buffers, caches, rings, trees, pools, ...): instead of setting a maximum size to your data structure you can leave it unbounded and then drop all (or some) of the allocated-but-unused slots after every GC run (e.g. sync.Pool drops all allocated-but-unused objects in the pool during GC). To minimize the load on the GC the code that runs after receiving the notification should try to avoid allocations as much as possible, or at the very least make sure that the amount of new memory allocated is significantly smaller than the amount of memory that has been "freed" in response to the notification. GCNotifier guarantees to send a notification after every GC cycle completes. Note that the Go runtime does not guarantee that the GC will run: specifically there is no guarantee that a GC will run before the program terminates. Example implements a simple time-based buffering io.Writer: data sent over dataCh is buffered for up to 100ms, then flushed out in a single call to out.Write and the buffer is reused. If GC runs, the buffer is flushed and then discarded so that it can be collected during the next GC run. The example is necessarily simplistic, a real implementation would be more refined (e.g. on GC flush or resize the buffer based on a threshold, perform asynchronous flushes, properly signal completions and propagate errors, adaptively preallocate the buffer based on the previous capacity, etc.)
Package xmm Package redblacktree provides a pure Golang implementation of a red-black tree as described by Thomas H. Cormen's et al. in their seminal Algorithms book (3rd ed). This data structure is not multi-goroutine safe.
Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) in a space-optimized way as described by Dacuik, Mihov, Watson, and Watson in their paper, "Incremental Construction of Minimal Acyclic Finite-State Automata" (2000). It also implements Minimal Perfect Hashing (MPH) as described by Lucceshi and Kowaltowski in their paper, "Applications of Finite Automata Representing Large Vocabularies" (1992). Unscientifically speaking, this package lets you store large amounts of strings (with Unicode) in memory so that membership queries, prefix lookups, and fuzzy searches are fast. And because minimal perfect hashing is included, you can associate each entry in the tree with more data used by your application. See the README or the end of this documentation for a brief tutorial. MA-FSA structures are a specific type of Deterministic Acyclic Finite State Automaton (DAFSA) which fold equivalent state transitions into each other starting from the suffix of each entry. Typical construction algorithms involve building out the entire tree first, then minimizing the completed tree. However, the method described in the paper above allows the tree to be minimized after every word insertion, provided the insertions are performed in lexicographical order, which drastically reduces memory usage compared to regular prefix trees ("tries"). The goal of this package is to provide a simple, useful, and correct implementation of MA-FSA. Though more complex algorithms exist for removal of items and unordered insertion, these features may be outside the scope of this package. Membership queries are on the order of O(n), where n is the length of the input string, so basically O(1). It is advisable to keep n small since long entries without much in common, especially in the beginning or end of the string, will quickly overrun the optimizations that are available. In those cases, n-gram implementations might be preferable, though these will use more CPU. This package provides two kinds of MA-FSA implementations. One, the BuildTree, facilitates the construction of an optimized tree and allows ordered insertions. The other, MinTree, is effectively read-only but uses significantly less memory and is ideal for production environments where only reads will be occurring. Usually your build process will be separate from your production application, which will make heavy use of reading the structure. To use this package, create a BuildTree and insert your items in lexicographical order: The tree is now compressed to a minimum number of nodes and is ready to be saved. In your production application, then, you can read the file into a MinTree directly: The mt variable is a *MinTree which has the same data as the original BuildTree, but without all the extra "scaffolding" that was required for adding new elements. The package provides some basic read mechanisms.
Package srcdom provides utilities to manipulate Go's AST (Abstract Structure Tree). Using srcdom, you can easily access/extract information of types, funcions and variables from AST.
Package btree implements in-memory B-Trees of arbitrary degree. btree implements an in-memory B-Tree for use as an ordered data structure. It is not meant for persistent storage solutions. It has a flatter structure than an equivalent red-black or other binary tree, which in some cases yields better memory usage and/or performance. See some discussion on the matter here: Note, though, that this project is in no way related to the C++ B-Tree implementation written about there. Within this tree, each node contains a slice of items and a (possibly nil) slice of children. For basic numeric values or raw structs, this can cause efficiency differences when compared to equivalent C++ template code that stores values in arrays within the node: These issues don't tend to matter, though, when working with strings or other heap-allocated structures, since C++-equivalent structures also must store pointers and also distribute their values across the heap. This implementation is designed to be a drop-in replacement to gollrb.LLRB trees, (http://github.com/petar/gollrb), an excellent and probably the most widely used ordered tree implementation in the Go ecosystem currently. Its functions, therefore, exactly mirror those of llrb.LLRB where possible. Unlike gollrb, though, we currently don't support storing multiple equivalent values.
Package ki provides the top-level repository for GoKi Trees: Ki = Tree in Japanese, and "Key" in English -- powerful tree structures supporting scenegraphs, programs, parsing, etc. The sub-packages contain all the relevant code: * ki: is the main Ki interface and Node implementation thereof. * kit: is a type registry that ki uses in various ways and provides useful type-level properties that are used in the GoGi GUI. It also is a powerful 'kit for dealing with Go's reflect system. * ints, floats, dirs, bitflag, atomctr, indent all provide basic Go infrastructure that one could argue should have been in the standard library, but isn't..
Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (3^2)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.
Package amt provides a reference implementation of the IPLD AMT (Array Mapped Trie) used in the Filecoin blockchain. The AMT algorithm is similar to a HAMT https://en.wikipedia.org/wiki/Hash_array_mapped_trie but instead presents an array-like interface where the indexes themselves form the mapping to nodes in the trie structure. An AMT is suitable for storing sparse array data as a minimum amount of intermediate nodes are required to address a small number of entries even when their indexes span a large distance. AMT is also a suitable means of storing non-sparse array data as required, with a small amount of storage and algorithmic overhead required to handle mapping that assumes that some elements within any range of data may not be present. The AMT algorithm produces a tree-like graph, with a single root node addressing a collection of child nodes which connect downward toward leaf nodes which store the actual entries. No terminal entries are stored in intermediate elements of the tree, unlike in a HAMT. We can divide up the AMT tree structure into "levels" or "heights", where a height of zero contains the terminal elements, and the maximum height of the tree contains the single root node. Intermediate nodes are used to span across the range of indexes. Any AMT instance uses a fixed "width" that is consistent across the tree's nodes. An AMT's "bitWidth" dictates the width, or maximum-brancing factor (arity) of the AMT's nodes by determining how many bits of the original index are used to determine the index at any given level. A bitWidth of 3 (the default for this implementation) can generate indexes in the range of 0 to (2^3)-1=7, i.e. a "width" of 8. In practice, this means that an AMT with a bitWidth of 3 has a branching factor of _between 1 and 8_ for any node in the structure. Considering the minimal case: a minimal AMT contains a single node which serves as both the root and the leaf node and can hold zero or more elements (an empty AMT is possible, although a special-case, and consists of a zero-length root). This minimal AMT can store array indexes from 0 to width-1 (8 for the default bitWidth of 3) without requiring the addition of additional nodes. Attempts to add additional indexes beyond width-1 will result in additional nodes being added and a tree structure in order to address the new elements. The minimal AMT node is said to have a height of 0. Every node in an AMT has a height that indicates its distance from the leaf nodes. All leaf nodes have a height of 0. The height of the root node dictates the overall height of the entire AMT. In the case of the minimal AMT, this is 0. Elements are stored in a compacted form within nodes, they are "position-mapped" by a bitmap field that is stored with the node. The bitmap is a simple byte array, where each bit represents an element of the data that can be stored in the node. With a width of 8, the bitmap is a single byte and up to 8 elements can be stored in the node. The data array of a node _only stores elements that are present in that node_, so the array is commonly shorter than the maximum width. An empty AMT is a special-case where the single node can have zero elements, therefore a zero-length data array and a bitmap of `0x00`. In all other cases, the data array must have between 1 and width elements. Determining the position of an index within the data array requires counting the number of set bits within the bitmap up to the element we are concerned with. If the bitmap has bits 2, 4 and 6 set, we can see that only 3 of the bits are set so our data array should hold 3 elements. To address index 4, we know that the first element will be index 2 and therefore the second will hold index 4. This format allows us to store only the elements that are set in the node. Overflow beyond the single node AMT by adding an index beyond width-1 requires an increase in height in order to address all elements. If an element in the range of width to (width*2)-1 is added, a single additional height is required which will result in a new root node which is used to address two consecutive leaf nodes. Because we have an arity of up to width at any node, the addition of indexes in the range of 0 to (width^2)-1 will still require only the addition of a single additional height above the leaf nodes, i.e. height 1. From the width of an AMT we can derive the maximum range of indexes that can be contained by an AMT at any given `height` with the formula width^(height+1)-1. e.g. an AMT with a width of 8 and a height of 2 can address indexes 0 to 8^(2+1)-1=511. Incrementing the height doubles the range of indexes that can be contained within that structure. Nodes above height 0 (non-leaf nodes) do not contain terminal elements, but instead, their data array contains links to child nodes. The index compaction using the bitmap is the same as for leaf nodes, so each non-leaf node only stores as many links as it has child nodes. Because additional height is required to address larger indexes, even a single-element AMT will require more than one node where the index is greater than the width of the AMT. For a width of 8, indexes 8 to 63 require a height of 1, indexes 64 to 511 require a height of 2, indexes 512 to 4095 require a height of 3, etc. Retrieving elements from the AMT requires extracting only the portion of the requested index that is required at each height to determine the position in the data array to navigate into. When traversing through the tree, we only need to select from indexes 0 to width-1. To do this, we take log2(width) bits from the index to form a number that is between 0 and width-1. e.g. for a width of 8, we only need 3 bits to form a number between 0 and 7, so we only consume 3 bits per level of the AMT as we traverse. A simple method to calculate this at any height in the AMT (assuming bitWidth of 3, i.e. a width of 8) is: 1. Calculate the maximum number of nodes (not entries) that may be present in an sub-tree rooted at the current height. width^height provides this number. e.g. at height 0, only 1 node can be present, but at height 3, we may have a tree of up to 512 nodes (storing up to 8^(3+1)=4096 entries). 2. Divide the index by this number to find the index for this height. e.g. an index of 3 at height 0 will be 3/1=3, or an index of 20 at height 1 will be 20/8=2. 3. If we are at height 0, the element we want is at the data index, position-mapped via the bitmap. 4. If we are above height 0, we need to navigate to the child element at the index we calculated, position-mapped via the bitmap. When traversing to the child, we discard the upper portion of the index that we no longer need. This can be achieved by a mod operation against the number-of-nodes value. e.g. an index of 20 at height 1 requires navigation to the element at position 2, when moving to that element (which is height 0), we truncate the index with 20%8=4, at height 0 this index will be the index in our data array (position-mapped via the bitmap). In this way, each sub-tree root consumes a small slice, log2(width) bits long, of the original index. Adding new elements to an AMT may require up to 3 steps: 1. Increasing the height to accommodate a new index if the current height is not sufficient to address the new index. Increasing the height requires turning the current root node into an intermediate and adding a new root which links to the old (repeated until the required height is reached). 2. Adding any missing intermediate and leaf nodes that are required to address the new index. Depending on the density of existing indexes, this may require the addition of up to height-1 new nodes to connect the root to the required leaf. Sparse indexes will mean large gaps in the tree that will need filling to address new, equally sparse, indexes. 3. Setting the element at the leaf node in the appropriate position in the data array and setting the appropriate bit in the bitmap. Removing elements requires a reversal of this process. Any empty node (other than the case of a completely empty AMT) must be removed and its parent should have its child link removed. This removal may recurse up the tree to remove many unnecessary intermediate nodes. The root node may also be removed if the current height is no longer necessary to contain the range of indexes still in the AMT. This can be easily determined if _only_ the first bit of the root's bitmap is set, meaning only the left-most is present, which will become the new root node (repeated until the new root has more than the first bit set or height of 0, the single-node case). See https://github.com/ipld/specs/blob/master/data-structures/hashmap.md for a description of a HAMT algorithm. And https://github.com/ipld/specs/blob/master/data-structures/vector.md for a description of a similar algorithm to an AMT that doesn't support internal node compression and therefore doesn't support sparse arrays. Unlike a HAMT, the AMT algorithm doesn't benefit from randomness introduced by a hash algorithm. Therefore an AMT used in cases where user-input can influence indexes, larger-than-necessary tree structures may present risks as well as the challenge imposed by having a strict upper-limit on the indexes addressable by the AMT. A width of 8, using 64-bit integers for indexing, allows for a tree height of up to 64/log2(8)=21 (i.e. a width of 8 has a bitWidth of 3, dividing the 64 bits of the uint into 21 separate per-height indexes). Careful placement of indexes could create extremely sub-optimal forms with large heights connecting leaf nodes that are sparsely packed. The overhead of the large number of intermediate nodes required to connect leaf nodes in AMTs that contain high indexes can be abused to create perverse forms that contain large numbers of nodes to store a minimal number of elements. Minimal nodes will be created where indexes are all in the lower-range. The optimal case for an AMT is contiguous index values starting from zero. As larger indexes are introduced that span beyond the current maximum, more nodes are required to address the new nodes _and_ the existing lower index nodes. Consider a case where a width=8 AMT is only addressing indexes less than 8 and requiring a single height. The introduction of a single index within 8 of the maximum 64-bit unsigned integer range will require the new root to have a height of 21 and have enough connecting nodes between it and both the existing elements and the new upper index. This pattern of behavior may be acceptable if there is significant density of entries under a particular maximum index. There is a direct relationship between the sparseness of index values and the number of nodes required to address the entries. This should be the key consideration when determining whether an AMT is a suitable data-structure for a given application.
Package ogdl is used to process OGDL, the Ordered Graph Data Language. OGDL is a textual format to write trees or graphs of text, where indentation and spaces define the structure. Here is an example: The languange is simple, either in its textual representation or its number of productions (the specification rules), allowing for compact implementations. OGDL character streams are normally formed by Unicode characters, and encoded as UTF-8 strings, but any encoding that is ASCII transparent is compatible with the specification. See the full spec at http://ogdl.org. To install this package just do: If we have a text file 'config.ogdl' containing: then, will print If the timeout parameter was not present, then the default value (60) will be assigned to 'to'. The default value is optional, but be aware that Int64() will return 0 in case that the parameter doesn't exist. The configuration file can be written in a conciser way: The package includes a template processor. It takes an arbitrary input stream with some variables in it, and produces an output stream with the variables resolved out of a Graph object which acts as context. For example (given the previous config file): string(b) is then: Some rules are followed:
File Structures 2 This is a follow up to my crufty http://github.com/timtadh/file-structures work. That system has some endemic problems: 1. It uses the read/write interface to files. This means that it needs to do block management and cache management. In theory this can be very fast but it is also very challenging in Go. 2. Because it uses read/write it has to do buffer management. Largely, the system punts on this problem and allows go to handle the buffer management through the normal memory management system. This doesn't work especially well for the use case of file-structures. File Structures 2 is an experiment to bring Memory Mapped IO to the world of Go. The hypotheses are: 1. The operating system is good at page management generally. While, we know more about how to manage the structure of B+Trees, VarChar stores, and Linear Hash tables than the OS there is no indication that from Go you can acheive better performance. Therefore, I hypothesize that leaving it to the OS will lead to a smaller working set and a faster data structure in general. 2. You can make Memory Mapping performant in Go. There are many challenges here. The biggest of which is that there are no dynamically size array TYPES in go. The size of the array is part of the type, you have to use slices. This creates complications when hooking up structures which contain slices to mmap allocated blocks of memory. I hypothesize that this repository can acheive good (enough) performance here. The major components of this project: 1. fmap - a memory mapped file inteface. Part C part Go. Uses cgo. 2. bptree - a B+ Tree with duplicate key support (fixed size keys, variable length values) written on top of fmap. 3. slice - used by fmap and bptree to completely violate memory and type safety of Go. 4. errors - just a simple error package which maintains a stack trace with every error.
Package merkletree implements a Merkle Tree capable of storing arbitrary content. A Merkle Tree is a hash tree that provides an efficient way to verify the contents of a set data are present and untampered with. At its core, a Merkle Tree is a list of items representing the data that should be verified. Each of these items is inserted into a leaf node and a tree of hashes is constructed bottom up using a hash of the nodes left and right children's hashes. This means that the root node will effictively be a hash of all other nodes (hashes) in the tree. This property allows the tree to be reproduced and thus verified by on the hash of the root node of the tree. The benefit of the tree structure is verifying any single content entry in the tree will require only nlog2(n) steps in the worst case. Creating a new merkletree requires that the type that the tree will be constructed from implements the Content interface. A slice of the Content items should be created and then passed to the NewTree method. t represents the Merkle Tree and can be verified and manipulated with the API methods described below.
Package hercules contains the functions which are needed to gather various statistics from a Git repository. The analysis is expressed in a form of the tree: there are nodes - "pipeline items" - which require some other nodes to be executed prior to selves and in turn provide the data for dependent nodes. There are several service items which do not produce any useful statistics but rather provide the requirements for other items. The top-level items include: - BurndownAnalysis - line burndown statistics for project, files and developers. - CouplesAnalysis - coupling statistics for files and developers. - ShotnessAnalysis - structural hotness and couples, by any Babelfish UAST XPath (functions by default). The typical API usage is to initialize the Pipeline class: Then add the required analysis: This call will add all the needed intermediate pipeline items. Then link and execute the analysis tree: Finally extract the result: The actual usage example is cmd/hercules/root.go - the command line tool's code. You can provide additional options via `facts` on initialization. For example, to provide your own logger, enable people-tracking, and set a custom tick size: Hercules depends heavily on https://github.com/src-d/go-git and leverages the diff algorithm through https://github.com/sergi/go-diff. Besides, BurndownAnalysis involves File and RBTree. These are low level data structures which enable incremental blaming. File carries an instance of RBTree and the current line burndown state. RBTree implements the red-black balanced binary tree and is based on https://github.com/yasushi-saito/rbtree. Coupling stats are supposed to be further processed rather than observed directly. labours.py uses Swivel embeddings and visualises them in Tensorflow Projector. Shotness analysis as well as other UAST-featured items relies on [Babelfish](https://doc.bblf.sh) and requires the server to be running.
Package xmlwriter provides a fast, non-cached, forward-only way to generate XML data. The API is based heavily on libxml's xmlwriter API [1], which is itself based on C#'s XmlWriter [2]. It offers some advantages over Go's default encoding/xml package and some tradeoffs. You can have complete control of the generated documents and it uses very little memory. There are two styles for interacting with the writer: structured and heap-friendly. If you want a visual representation of the hierarchy of some of your writes in your code and you don't care about a few instances of memory escaping to the heap (and most of the time you won't), you can use the structured API. If you are writing a code generator or your interactions with the API are minimal, you should use the direct API. xmlwriter.Writer{} takes any io.Writer, along with a variable list of options. xmlwriter options are based on Dave Cheney's functional options pattern (https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis): Provided options are: Using the structured API, you might express a small tree of elements like this. These nodes will escape to the heap, but judicious use of this nesting can make certain structures a lot more readable by representing the desired XML hierarchy in the code that produces it: The code can be made even less dense by importing xmlwriter with a prefix: `import xw "github.com/shabbyrobe/xmlwriter"` The same output is possible with the heap-friendy API. This has a lot more stutter and it's harder to tell the hierarchical relationship just by looking at the code, but there are no heap escapes this way: Use whichever API reads best in your code, but favour the latter style in all code generators and performance hotspots. xmlwriter.Writer extends bufio.Writer! Don't forget to flush otherwise you'll lose data. There are two ways to flush: The EndAllFlush form is just a convenience, it calls EndAll() and Flush() for you. Nodes which can have children can be passed to `Writer.Start()`. This adds them to the stack and opens them, allowing children to be added. Becomes: <foo><bar><baz/></bar></foo> Nodes which have no children, or nodes which can be opened and fully closed with only a trivial amount of information, can be passed to `Writer.Write()`. If written nodes are put on to the stack, they will be popped before Write returns. Becomes: <foo/><bar/><baz/> Block takes a Startable and a variable number of Writable nodes. The Startable will be opened, the Writables will be written, then the Startable will be closed: Becomes: There are several ways to end an element. Choose the End that's right for you! Nodes as they are written can be in three states: StateOpen, StateOpened or StateEnd. StateOpen == "<elem". StateOpened == "<elem>". StateEnd == "<elem></elem>". Node structs are available for writing in the following hierarchy. Nodes which are "Startable" (passed to `writer.Start(n)`) are marked with an S. Nodes which are "Writable" (passed to `writer.Write(n)`) are marked with a W. - xmlwriter.Raw* (W) - xmlwriter.Doc (S) * `xmlwriter.Raw` can be written anywhere, at any time. If a node is in the "open" state but not in the "opened" state, for example you have started an element and written an attribute, writing "raw" will add the content to the inside of the element opening tag unless you call `w.Next()`. Every node has a corresponding NodeKind constant, which can be found by affixing "Node" to the struct name, i.e. "xmlwriter.Elem" becomes "xmlwriter.ElemNode". These are used for calls to Writer.End(). xmlwriter.Attr{} values can be assigned from any golang primitive like so: xmlwriter supports encoders from the golang.org/x/text/encoding package. UTF-8 strings written in from golang will be converted on the fly and the document declaration will be written correctly. To write your XML using the windows-1252 encoder: The document line will look like this:
Package CloudForest implements ensembles of decision trees for machine learning in pure Go (golang to search engines). It allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical/categorical data with missing values. These include: Breiman and Cutler's Random Forest for Classification and Regression Adaptive Boosting (AdaBoost) Classification Gradiant Boosting Tree Regression Entropy and Cost driven classification L1 regression Feature selection with artificial contrasts Proximity and model structure analysis Roughly balanced bagging for unbalanced classification The API hasn't stabilized yet and may change rapidly. Tests and benchmarks have been performed only on embargoed data sets and can not yet be released. Library Documentation is in code and can be viewed with godoc or live at: http://godoc.org/github.com/ryanbressler/CloudForest Documentation of command line utilities and file formats can be found in README.md, which can be viewed fromated on github: http://github.com/ryanbressler/CloudForest Pull requests and bug reports are welcome. CloudForest was created by Ryan Bressler and is being developed in the Shumelivich Lab at the Institute for Systems Biology for use on genomic/biomedical data with partial support from The Cancer Genome Atlas and the Inova Translational Medicine Institute. CloudForest is intended to provide fast, comprehensible building blocks that can be used to implement ensembles of decision trees. CloudForest is written in Go to allow a data scientist to develop and scale new models and analysis quickly instead of having to modify complex legacy code. Data structures and file formats are chosen with use in multi threaded and cluster environments in mind. Go's support for function types is used to provide a interface to run code as data is percolated through a tree. This method is flexible enough that it can extend the tree being analyzed. Growing a decision tree using Breiman and Cutler's method can be done in an anonymous function/closure passed to a tree's root node's Recurse method: This allows a researcher to include whatever additional analysis they need (importance scores, proximity etc) in tree growth. The same Recurse method can also be used to analyze existing forests to tabulate scores or extract structure. Utilities like leafcount and errorrate use this method to tabulate data about the tree in collection objects. Decision tree's are grown with the goal of reducing "Impurity" which is usually defined as Gini Impurity for categorical targets or mean squared error for numerical targets. CloudForest grows trees against the Target interface which allows for alternative definitions of impurity. CloudForest includes several alternative targets: Additional targets can be stacked on top of these target to add boosting functionality: Repeatedly splitting the data and searching for the best split at each node of a decision tree are the most computationally intensive parts of decision tree learning and CloudForest includes optimized code to perform these tasks. Go's slices are used extensively in CloudForest to make it simple to interact with optimized code. Many previous implementations of Random Forest have avoided reallocation by reordering data in place and keeping track of start and end indexes. In go, slices pointing at the same underlying arrays make this sort of optimization transparent. For example a function like: can return left and right slices that point to the same underlying array as the original slice of cases but these slices should not have their values changed. Functions used while searching for the best split also accepts pointers to reusable slices and structs to maximize speed by keeping memory allocations to a minimum. BestSplitAllocs contains pointers to these items and its use can be seen in functions like: For categorical predictors, BestSplit will also attempt to intelligently choose between 4 different implementations depending on user input and the number of categories. These include exhaustive, random, and iterative searches for the best combination of categories implemented with bitwise operations against int and big.Int. See BestCatSplit, BestCatSplitIter, BestCatSplitBig and BestCatSplitIterBig. All numerical predictors are handled by BestNumSplit which relies on go's sorting package. Training a Random forest is an inherently parallel process and CloudForest is designed to allow parallel implementations that can tackle large problems while keeping memory usage low by writing and using data structures directly to/from disk. Trees can be grown in separate go routines. The growforest utility provides an example of this that uses go routines and channels to grow trees in parallel and write trees to disk as the are finished by the "worker" go routines. The few summary statistics like mean impurity decrease per feature (importance) can be calculated using thread safe data structures like RunningMean. Trees can also be grown on separate machines. The .sf stochastic forest format allows several small forests to be combined by concatenation and the ForestReader and ForestWriter structs allow these forests to be accessed tree by tree (or even node by node) from disk. For data sets that are too big to fit in memory on a single machine Tree.Grow and FeatureMatrix.BestSplitter can be reimplemented to load candidate features from disk, distributed database etc. By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data: Missing values in the target variable are left out of impurity calculations. This provided generally good results at a fraction of the computational costs of imputing data. Optionally, feature.ImputeMissing or featurematrixImputeMissing can be called before forest growth to impute missing values to the feature mean/mode which Brieman [2] suggests as a fast method for imputing values. This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes. Experimental support is provided for 3 way splitting which splits missing cases onto a third branch. [2] This has so far yielded mixed results in testing. At some point in the future support may be added for local imputing of missing values during tree growth as described in [3] [1] http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1 [2] https://code.google.com/p/rf-ace/ [3] http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.aoas/1223908043&page=record In CloudForest data is stored using the FeatureMatrix struct which contains Features. The Feature struct implements storage and methods for both categorical and numerical data and calculations of impurity etc and the search for the best split. The Target interface abstracts the methods of Feature that are needed for a feature to be predictable. This allows for the implementation of alternative types of regression and classification. Trees are built from Nodes and Splitters and stored within a Forest. Tree has a Grow implements Brieman and Cutler's method (see extract above) for growing a tree. A GrowForest method is also provided that implements the rest of the method including sampling cases but it may be faster to grow the forest to disk as in the growforest utility. Prediction and Voting is done using Tree.Vote and CatBallotBox and NumBallotBox which implement the VoteTallyer interface.
Package antlr implements the Go version of the ANTLR 4 runtime. ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface (or visitor) that makes it easy to respond to the recognition of phrases of interest. ANTLR supports the generation of code in a number of target languages, and the generated code is supported by a runtime library, written specifically to support the generated code in the target language. This library is the runtime for the Go target. To generate code for the go target, it is generally recommended to place the source grammar files in a package of their own, and use the `.sh` script method of generating code, using the go generate directive. In that same directory it is usual, though not required, to place the antlr tool that should be used to generate the code. That does mean that the antlr tool JAR file will be checked in to your source code control though, so you are free to use any other way of specifying the version of the ANTLR tool to use, such as aliasing in `.zshrc` or equivalent, or a profile in your IDE, or configuration in your CI system. Here is a general template for an ANTLR based recognizer in Go: Make sure that the package statement in your grammar file(s) reflects the go package they exist in. The generate.go file then looks like this: And the generate.sh file will look similar to this: depending on whether you want visitors or listeners or any other ANTLR options. From the command line at the root of your package “myproject” you can then simply issue the command: Copyright (c) 2012-2022 The ANTLR Project. All rights reserved. Use of this file is governed by the BSD 3-clause license, which can be found in the LICENSE.txt file in the project root.
Package appdash provides a Go app performance tracing suite. Appdash allows you to trace the end-to-end performance of hierarchically structured applications. You can, for example, measure the time and see the detailed information of each HTTP request and SQL query made by an entire distributed web application. The cmd/appdash tool launches a web front-end which displays a web UI for viewing collected app traces. It is effectively a remote collector which your application can connect and send events to. Timing and application-specific metadata information can be viewed in a nice timeline view for each span (e.g. HTTP request) and it's children. The web front-end can also be embedded in your own Go HTTP server by utilizing the traceapp sub-package, which is effectively what cmd/appdash serves internally. Sub-packages for HTTP and SQL event tracing are provided for use with appdash, which allows it to function equivalently to Google's Dapper and Twitter's Zipkin performance tracing suites. The most high-level structure is a Trace, which represents the performance of an application from start to finish (in an HTTP application, for example, the loading of a web page). A Trace is a tree structure that is made up of several spans, which are just IDs (in an HTTP application, these ID's are passed through the stack via a few special headers). Each span ID has a set of Events that directly correspond to it inside a Collector. These events can be any combination of message, log, time-span, or time-stamped events (the cmd/appdash web UI displays these events as appropriate). Inside your application, a Recorder is used to send events to a Collector, which can be a remote HTTP(S) collector, a local in-memory or persistent collector, etc. Additionally, you can implement the Collector interface yourself and store events however you like.
Package avltree implements a height-balanced binary tree with array-like indexing capability. An AVL tree (Adel'son-Vel'skii & Landis) is a binary search tree in which the heights of the left and right subtrees of the root differ by at most one and in which the left and right subtrees are again AVL trees. With each node of an AVL tree is associated a balance factor that is Left High, Equal, or Right High according, respectively, as the left subtree has height greater than, equal to, or less than that of the right subtree. The AVL tree is, in practice, balanced quite well. It can (at the worst case) become skewed to the left or right, but never so much that it becomes inefficient. The balancing is done as items are added or deleted. This version is enhanced to allow "indexing" of values in the tree; however, the indexes are not stable as the tree could be resorted as items are added or removed. It is safe to iterate or search a tree from multiple threads provided that no threads are modifying the tree. See also: Robert L. Kruse, Data Structures and Program Design, 2nd Ed., Prentice-Hall
Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin. HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID). The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive. Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order. This HAMT implementation: • Fixes the `bucketSize` to 3. • Defaults the `bitWidth` to 8, however within Filecoin it uses 5 • Defaults the hash algorithm to the 64-bit variant of Murmur3-x64 The algorithm used here is identical to that of the IPLD HashMap algorithm specified at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The specific parameters used by Filecoin and the DAG-CBOR block layout differ from the specification and are defined at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.
Package scapegoat implements a Scapegoat Tree, as described in the paper A scapegoat tree is an approximately-balanced binary search tree structure with worst-case O(lg n) lookup and amortized O(lg n) insert and delete. The worst-case cost of a single insert or delete is O(n). It is also relatively memory-efficient, as interior nodes do not require any ancillary metadata for balancing purposes, and the tree itself costs only a few words of bookkeeping overhead beyond the nodes. A rebalancing operation requires only a single contiguous vector allocation.
This package is the root package of the govmomi library. The library is structured as follows: The minimal usable functionality is available through the vim25 package. It contains subpackages that contain generated types, managed objects, and all available methods. The vim25 package is entirely independent of the other packages in the govmomi tree -- it has no dependencies on its peers. The vim25 package itself contains a client structure that is passed around throughout the entire library. It abstracts a session and its immutable state. See the vim25 package for more information. The session package contains an abstraction for the session manager that allows a user to login and logout. It also provides access to the current session (i.e. to determine if the user is in fact logged in) The object package contains wrappers for a selection of managed objects. The constructors of these objects all take a *vim25.Client, which they pass along to derived objects, if applicable. The govc package contains the govc CLI. The code in this tree is not intended to be used as a library. Any functionality that govc contains that _could_ be used as a library function but isn't, _should_ live in a root level package. Other packages, such as "event", "guest", or "license", provide wrappers for the respective subsystems. They are typically not needed in normal workflows so are kept outside the object package.
Package ki provides the base element of Goki Trees: Ki = Tree in Japanese, and "Key" in English -- powerful tree structures supporting scenegraphs, programs, parsing, etc. The Node struct that implements the Ki interface, which can be used as an embedded type (or a struct field) in other structs to provide core tree functionality, including: Parent / Child Tree structure -- each Node can ONLY have one parent. Node struct's can also have Node fields -- these are functionally like fixed auto-named children. Paths for locating Nodes within the hierarchy -- key for many use-cases, including ability to convert pointers to/from strings for IO and robust deep copy and move functions. The path separator is / for children and . for fields. Apply a function across nodes up or down a tree (natural "me first", breadth-first, depth-first) -- very flexible for tree walking. Generalized I/O -- can Save and Load the Tree as JSON, XML, etc -- including pointers which are saved using paths and automatically cached-out after loading -- enums also bidirectionally convertable to strings using enum type registry in kit package. Robust deep copy, clone, move of nodes. Signal sending and receiving between Nodes (simlar to Qt Signals / Slots) -- setup connections once and then emit signals to all receivers when relevant event happens. Robust state updating -- wrap updates in UpdateStart / End, and signals are blocked until the final end, at the highest affected level in the tree, at which point a single update signal is sent -- automatically gives the minimal update. Properties (as a string-keyed map) with property inheritance, including type-level properties via kit type registry. In general, the names of the children of a given node should all be unique. The following functions defined in ki package can be used: * UniqueNameCheck(node) to check for unique names on node if uncertain. * UniqueNameCheckAll(node) to check entire tree under given node. * UniquifyNames(node) to add a suffix to name to ensure uniqueness. * UniquifyNamesAll(node) to to uniquify all names in entire tree. The Ki interface is designed to support virtual method calling in Go and is only intended to be implemented once, by the ki.Node type (as opposed to interfaces that are used for hiding multiple different implementations of a common concept). Thus, all of the fields in ki.Node are exported (have captital names), to be accessed directly in types that embed and extend the ki.Node. The Ki interface has the "formal" name (e.g., Children) while the Node has the "nickname" (e.g., Kids). See the Naming Conventions on the Goki Wiki for more details. Each Node stores the Ki interface version of itself, as This() / Ths which enables full virtual function calling by calling the method on that interface instead of directly on the receiver Node itself. This requires proper initialization via Init method of the Ki interface.
The rbxfile package handles the decoding, encoding, and manipulation of Roblox instance data structures. This package can be used to manipulate Roblox instance trees outside of the Roblox client. Such data structures begin with a Root struct. A Root contains a list of child Instances, which in turn contain more child Instances, and so on, forming a tree of Instances. These Instances can be accessed and manipulated using an API similar to that of Roblox. Each Instance also has a set of "properties". Each property has a specific value of a certain type. Every available type implements the Value interface, and is prefixed with "Value". Root structures can be decoded from and encoded to various formats, including Roblox's native file formats. The two sub-packages "rbxl" and "rbxlx" provide formats for Roblox's binary and XML formats. Root structures can also be encoded and decoded with the "json" package. Besides decoding from a format, root structures can also be created manually. The best way to do this is through the "declare" sub-package, which provides an easy way to generate root structures.
Package toml is a TOML parser and manipulation library. This version supports the specification as described in https://github.com/toml-lang/toml/blob/master/versions/en/toml-v0.4.0.md Go-toml can marshal and unmarshal TOML documents from and to data structures. Go-toml can operate on a TOML document as a tree. Use one of the Load* functions to parse TOML data and obtain a Tree instance, then one of its methods to manipulate the tree. The package github.com/pelletier/go-toml/query implements a system similar to JSONPath to quickly retrieve elements of a TOML document using a single expression. See the package documentation for more information.
Package toml is a TOML markup language parser. This version supports the specification as described in https://github.com/toml-lang/toml/blob/master/versions/en/toml-v0.4.0.md TOML data may be parsed in two ways: by file, or by string. Either way, the result is a TomlTree object that can be used to navigate the structure and data within the original document. After parsing TOML data with Load() or LoadFile(), use the Has() and Get() methods on the returned TomlTree, to find your way through the document data. Go-toml has support for basic dot-separated key paths on the Has(), Get(), Set() and GetDefault() methods. These are the same kind of key paths used within the TOML specification for struct tames. TOML allows keys to contain '.', which can cause this syntax to be problematic for some documents. In such cases, use the GetPath(), HasPath(), and SetPath(), methods to explicitly define the path. This form is also faster, since it avoids having to parse the passed key for '.' delimiters. Note that this is distinct from the heavyweight query syntax supported by TomlTree.Query() and the Query() struct (see below). Each element within the TomlTree is stored with position metadata, which is invaluable for providing semantic feedback to a user. This helps in situations where the TOML file parses correctly, but contains data that is not correct for the application. In such cases, an error message can be generated that indicates the problem line and column number in the source TOML document. The TOML query path implementation is based loosely on the JSONPath specification: http://goessner.net/articles/JsonPath/ The idea behind a query path is to allow quick access to any element, or set of elements within TOML document, with a single expression. This is roughly equivalent to: err is nil if any parsing exception occurs. If no node in the tree matches the query, result will simply contain an empty list of items. As illustrated above, the query path is much more efficient, especially since the structure of the TOML file can vary. Rather than making assumptions about a document's structure, a query allows the programmer to make structured requests into the document, and get zero or more values as a result. The syntax of a query begins with a root token, followed by any number sub-expressions: Index expressions perform no bounds checking, and will contribute no values to the result set if the provided index or index range is invalid. Negative indexes represent values from the end of the array, counting backwards. Slice expressions are supported, by using ':' to separate a start/end index pair. Slice expressions also allow negative indexes for the start and stop arguments. Slice expressions may have an optional stride/step parameter: Slice start and end parameters are also optional: Query filters are used within a Union [,] or single Filter [] expression. A filter only allows nodes that qualify through to the next expression, and/or into the result set. There are several filters provided with the library: An executed query returns a QueryResult object. This contains the nodes in the TOML tree that qualify the query expression. Position information is also available for each value in the set. Queries may be executed directly on a TomlTree object, or compiled ahead of time and executed discretely. The former is more convienent, but has the penalty of having to recompile the query expression each time. Filter expressions may also be user defined by using the SetFilter() function on the Query object. The function must return true/false, which signifies if the passed node is kept or discarded, respectively.
Package hercules contains the functions which are needed to gather various statistics from a Git repository. The analysis is expressed in a form of the tree: there are nodes - "pipeline items" - which require some other nodes to be executed prior to selves and in turn provide the data for dependent nodes. There are several service items which do not produce any useful statistics but rather provide the requirements for other items. The top-level items include: - BurndownAnalysis - line burndown statistics for project, files and developers. - CouplesAnalysis - coupling statistics for files and developers. - ShotnessAnalysis - structural hotness and couples, by any Babelfish UAST XPath (functions by default). The typical API usage is to initialize the Pipeline class: Then add the required analysis: This call will add all the needed intermediate pipeline items. Then link and execute the analysis tree: Finally extract the result: The actual usage example is cmd/hercules/root.go - the command line tool's code. Hercules depends heavily on https://github.com/src-d/go-git and leverages the diff algorithm through https://github.com/sergi/go-diff. Besides, BurndownAnalysis involves File and RBTree. These are low level data structures which enable incremental blaming. File carries an instance of RBTree and the current line burndown state. RBTree implements the red-black balanced binary tree and is based on https://github.com/yasushi-saito/rbtree. Coupling stats are supposed to be further processed rather than observed directly. labours.py uses Swivel embeddings and visualises them in Tensorflow Projector. Shotness analysis as well as other UAST-featured items relies on [Babelfish](https://doc.bblf.sh) and requires the server to be running.
Package hercules contains the functions which are needed to gather various statistics from a Git repository. The analysis is expressed in a form of the tree: there are nodes - "pipeline items" - which require some other nodes to be executed prior to selves and in turn provide the data for dependent nodes. There are several service items which do not produce any useful statistics but rather provide the requirements for other items. The top-level items include: - BurndownAnalysis - line burndown statistics for project, files and developers. - CouplesAnalysis - coupling statistics for files and developers. - ShotnessAnalysis - structural hotness and couples, by any Babelfish UAST XPath (functions by default). The typical API usage is to initialize the Pipeline class: Then add the required analysis: This call will add all the needed intermediate pipeline items. Then link and execute the analysis tree: Finally extract the result: The actual usage example is cmd/hercules/root.go - the command line tool's code. You can provide additional options via `facts` on initialization. For example, to provide your own logger, enable people-tracking, and set a custom tick size: Hercules depends heavily on https://github.com/src-d/go-git and leverages the diff algorithm through https://github.com/sergi/go-diff. Besides, BurndownAnalysis involves File and RBTree. These are low level data structures which enable incremental blaming. File carries an instance of RBTree and the current line burndown state. RBTree implements the red-black balanced binary tree and is based on https://github.com/yasushi-saito/rbtree. Coupling stats are supposed to be further processed rather than observed directly. labours.py uses Swivel embeddings and visualises them in Tensorflow Projector. Shotness analysis as well as other UAST-featured items relies on [Babelfish](https://doc.bblf.sh) and requires the server to be running.
Package hercules contains the functions which are needed to gather various statistics from a Git repository. The analysis is expressed in a form of the tree: there are nodes - "pipeline items" - which require some other nodes to be executed prior to selves and in turn provide the data for dependent nodes. There are several service items which do not produce any useful statistics but rather provide the requirements for other items. The top-level items include: - BurndownAnalysis - line burndown statistics for project, files and developers. - CouplesAnalysis - coupling statistics for files and developers. - ShotnessAnalysis - structural hotness and couples, by any Babelfish UAST XPath (functions by default). The typical API usage is to initialize the Pipeline class: Then add the required analysis: This call will add all the needed intermediate pipeline items. Then link and execute the analysis tree: Finally extract the result: The actual usage example is cmd/hercules/root.go - the command line tool's code. Hercules depends heavily on https://github.com/src-d/go-git and leverages the diff algorithm through https://github.com/sergi/go-diff. Besides, BurndownAnalysis involves File and RBTree. These are low level data structures which enable incremental blaming. File carries an instance of RBTree and the current line burndown state. RBTree implements the red-black balanced binary tree and is based on https://github.com/yasushi-saito/rbtree. Coupling stats are supposed to be further processed rather than observed directly. labours.py uses Swivel embeddings and visualises them in Tensorflow Projector. Shotness analysis as well as other UAST-featured items relies on [Babelfish](https://doc.bblf.sh) and requires the server to be running.