package bolt implements a low-level key/value store in pure Go. It supports fully serializable transactions, ACID semantics, and lock-free MVCC with multiple readers and a single writer. Bolt can be used for projects that want a simple data store without the need to add large dependencies such as Postgres or MySQL. Bolt is a single-level, zero-copy, B+tree data store. This means that Bolt is optimized for fast read access and does not require recovery in the event of a system crash. Transactions which have not finished committing will simply be rolled back in the event of a crash. The design of Bolt is based on Howard Chu's LMDB database project. Bolt currently works on Windows, Mac OS X, and Linux. There are only a few types in Bolt: DB, Bucket, Tx, and Cursor. The DB is a collection of buckets and is represented by a single file on disk. A bucket is a collection of unique keys that are associated with values. Transactions provide either read-only or read-write access to the database. Read-only transactions can retrieve key/value pairs and can use Cursors to iterate over the dataset sequentially. Read-write transactions can create and delete buckets and can insert and remove keys. Only one read-write transaction is allowed at a time. The database uses a read-only, memory-mapped data file to ensure that applications cannot corrupt the database, however, this means that keys and values returned from Bolt cannot be changed. Writing to a read-only byte slice will cause Go to panic. Keys and values retrieved from the database are only valid for the life of the transaction. When used outside the transaction, these byte slices can point to different data or can point to invalid memory which will cause a panic.
Package bolt implements a low-level key/value store in pure Go. It supports fully serializable transactions, ACID semantics, and lock-free MVCC with multiple readers and a single writer. Bolt can be used for projects that want a simple data store without the need to add large dependencies such as Postgres or MySQL. Bolt is a single-level, zero-copy, B+tree data store. This means that Bolt is optimized for fast read access and does not require recovery in the event of a system crash. Transactions which have not finished committing will simply be rolled back in the event of a crash. The design of Bolt is based on Howard Chu's LMDB database project. Bolt currently works on Windows, Mac OS X, and Linux. There are only a few types in Bolt: DB, Bucket, Tx, and Cursor. The DB is a collection of buckets and is represented by a single file on disk. A bucket is a collection of unique keys that are associated with values. Transactions provide either read-only or read-write access to the database. Read-only transactions can retrieve key/value pairs and can use Cursors to iterate over the dataset sequentially. Read-write transactions can create and delete buckets and can insert and remove keys. Only one read-write transaction is allowed at a time. The database uses a read-only, memory-mapped data file to ensure that applications cannot corrupt the database, however, this means that keys and values returned from Bolt cannot be changed. Writing to a read-only byte slice will cause Go to panic. Keys and values retrieved from the database are only valid for the life of the transaction. When used outside the transaction, these byte slices can point to different data or can point to invalid memory which will cause a panic.
shortinette is the core framework for managing and automating the process of grading coding bootcamps (Shorts). It provides a comprehensive set of tools for running and testing student submissions across various programming languages. The shortinette package is composed of several sub-packages, each responsible for a specific aspect of the grading pipeline: `logger`: Handles logging for the framework, including general informational messages, error reporting, and trace logging for feedback on individual submissions. This package ensures that all important events and errors are captured for debugging and auditing purposes. `requirements`: Validates the necessary environment variables and dependencies required by the framework. This includes checking for essential configuration values in a `.env` file and ensuring that all necessary tools (e.g., Docker images) are available before grading begins. `testutils`: Provides utility functions for compiling and running code submissions. This includes functions for compiling Rust code, running executables with various options (such as timeouts and real-time output), and manipulating files. The utility functions are designed to handle the intricacies of running untrusted student code safely and efficiently. `git`: Manages interactions with GitHub, including cloning repositories, managing collaborators, and uploading files. This package abstracts the GitHub API to simplify common tasks such as adding collaborators to repositories, creating branches, and pushing code or data to specific locations in a repository. `exercise`: Defines the structure and behavior of individual coding exercises. This includes specifying the files that students are allowed to submit, the expected output, and the functions to be tested. The `exercise` package provides the framework for setting up exercises, running tests, and reporting results. `module`: Organizes exercises into modules, allowing for the grouping of related exercises into a coherent curriculum. The `module` package handles the execution of all exercises within a module, aggregates results, and manages the overall grading process. `webhook`: Enables automatic grading triggered by GitHub webhooks. This allows for a fully automated workflow where student submissions are graded as soon as they are pushed to a specific branch in a GitHub repository. `short`: The central orchestrator of the grading process, integrating all sub-packages into a cohesive system. The `short` package handles the setup and teardown of grading environments, manages the execution of modules and exercises, and ensures that all results are properly recorded and reported.
Package mast provides an immutable, versioned, diffable map implementation of a Merkle Search Tree (MST). Masts can be huge (not limited to memory). Masts can be stored in anything, like a filesystem, KV store, or blob store. Masts are designed to be safely concurrently accessed by multiple threads/hosts. And like Merkle DAGs, Masts are designed to be easy to cache and synchronize. - Efficient storage of multiple versions of materialized views - Diffing of versions integrates CDC/streaming - Efficient copy-on-write alternative to Go builtin map Mast is an implementation of the structure described in the awesome paper, "Merkle Search Trees: Efficient State-Based CRDTs in Open Networks", by Alex Auvolat and François Taïani, 2019 (https://hal.inria.fr/hal-02303490/document). MSTs are similar to persistent B-Trees, except node splits are chosen deterministically based on key and tree height, rather than node size, so MSTs eliminate the dependence on insertion/deletion order that B-Trees have for equivalence, making MSTs an interesting choice for conflict-free replicated data types (CRDTs). (MSTs are also simpler to implement than B-Trees, needing less complicated rebalancing, and no rotations.) MSTs are like other Merkle structures in that two instances can easily be compared to confirm equality or find differences, since equal node hashes indicate equal contents. A Mast can be Clone()d for sharing between threads. Cloning creates a new version that can evolve independently, yet shares all the unmodified subtrees with its parent, and as such are relatively cheap to create. The immutable data types in Clojure, Haskell, ML and other functional languages really do make it easier to "reason about" systems; easier to test, provide a foundation to build more quickly on. https://github.com/bodil/im-rs, "Blazing fast immutable collection datatypes for Rust", by Bodil Stokke, is an exemplar: the diff algorithm and use of property testing, are instructive, and Chunks and PoolRefs fill gaps in understanding of Rust's ownership model for library writers coming from GC'd languages.
nxadm/tail provides a Go library that emulates the features of the BSD `tail` program. The library comes with full support for truncation/move detection as it is designed to work with log rotation tools. The library works on all operating systems supported by Go, including POSIX systems like Linux and *BSD, and MS Windows. Go 1.9 is the oldest compiler release supported.
Package gnark provides fast Zero Knowledge Proofs (ZKP) systems and a high level APIs to design ZKP circuits. gnark supports the following ZKP schemes: gnark supports the following curves: User documentation https://docs.gnark.consensys.net
Package cron implements a cron spec parser and job runner. To download the specific tagged release, run: Import it in your program as: It requires Go 1.11 or later due to usage of Go Modules. Callers may register Funcs to be invoked on a given schedule. Cron will run them in their own goroutines. A cron expression represents a set of times, using 5 space-separated fields. Month and Day-of-week field values are case insensitive. "SUN", "Sun", and "sun" are equally accepted. The specific interpretation of the format is based on the Cron Wikipedia page: https://en.wikipedia.org/wiki/Cron Alternative Cron expression formats support other fields like seconds. You can implement that by creating a custom Parser as follows. Since adding Seconds is the most common modification to the standard cron spec, cron provides a builtin function to do that, which is equivalent to the custom parser you saw earlier, except that its seconds field is REQUIRED: That emulates Quartz, the most popular alternative Cron schedule format: http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/crontrigger.html Asterisk ( * ) The asterisk indicates that the cron expression will match for all values of the field; e.g., using an asterisk in the 5th field (month) would indicate every month. Slash ( / ) Slashes are used to describe increments of ranges. For example 3-59/15 in the 1st field (minutes) would indicate the 3rd minute of the hour and every 15 minutes thereafter. The form "*\/..." is equivalent to the form "first-last/...", that is, an increment over the largest possible range of the field. The form "N/..." is accepted as meaning "N-MAX/...", that is, starting at N, use the increment until the end of that specific range. It does not wrap around. Comma ( , ) Commas are used to separate items of a list. For example, using "MON,WED,FRI" in the 5th field (day of week) would mean Mondays, Wednesdays and Fridays. Hyphen ( - ) Hyphens are used to define ranges. For example, 9-17 would indicate every hour between 9am and 5pm inclusive. Question mark ( ? ) Question mark may be used instead of '*' for leaving either day-of-month or day-of-week blank. You may use one of several pre-defined schedules in place of a cron expression. You may also schedule a job to execute at fixed intervals, starting at the time it's added or cron is run. This is supported by formatting the cron spec like this: where "duration" is a string accepted by time.ParseDuration (http://golang.org/pkg/time/#ParseDuration). For example, "@every 1h30m10s" would indicate a schedule that activates after 1 hour, 30 minutes, 10 seconds, and then every interval after that. Note: The interval does not take the job runtime into account. For example, if a job takes 3 minutes to run, and it is scheduled to run every 5 minutes, it will have only 2 minutes of idle time between each run. By default, all interpretation and scheduling is done in the machine's local time zone (time.Local). You can specify a different time zone on construction: Individual cron schedules may also override the time zone they are to be interpreted in by providing an additional space-separated field at the beginning of the cron spec, of the form "CRON_TZ=Asia/Tokyo". For example: The prefix "TZ=(TIME ZONE)" is also supported for legacy compatibility. Be aware that jobs scheduled during daylight-savings leap-ahead transitions will not be run! A Cron runner may be configured with a chain of job wrappers to add cross-cutting functionality to all submitted jobs. For example, they may be used to achieve the following effects: Install wrappers for all jobs added to a cron using the `cron.WithChain` option: Install wrappers for individual jobs by explicitly wrapping them: Since the Cron service runs concurrently with the calling code, some amount of care must be taken to ensure proper synchronization. All cron methods are designed to be correctly synchronized as long as the caller ensures that invocations have a clear happens-before ordering between them. Cron defines a Logger interface that is a subset of the one defined in github.com/go-logr/logr. It has two logging levels (Info and Error), and parameters are key/value pairs. This makes it possible for cron logging to plug into structured logging systems. An adapter, [Verbose]PrintfLogger, is provided to wrap the standard library *log.Logger. For additional insight into Cron operations, verbose logging may be activated which will record job runs, scheduling decisions, and added or removed jobs. Activate it with a one-off logger as follows: Cron entries are stored in an array, sorted by their next activation time. Cron sleeps until the next job is due to be run. Upon waking:
Package hord provides a simple and extensible interface for interacting with various database systems in a uniform way. Hord is designed to be a database-agnostic library that provides a common interface for interacting with different database systems. It allows developers to write code that is decoupled from the underlying database technology, making it easier to switch between databases or support multiple databases in the same application. To use Hord, import it as follows: To create a database client, you need to import and use the appropriate driver package along with the `hord` package. For example, to use the Redis driver: Each driver provides its own `Dial` function to establish a connection to the database. Refer to the specific driver documentation for more details. Once you have a database client, you can use it to perform various database operations. The API is consistent across different drivers. Refer to the `hord.Database` interface documentation for a complete list of available methods. Hord provides common error types and constants for consistent error handling across drivers. Refer to the `hord` package documentation for more information on error handling. Contributions to Hord are welcome! If you want to add support for a new database driver or improve the existing codebase, please refer to the contribution guidelines in the project's repository.
Package multitenancy provides a Go framework for building multi-tenant applications, streamlining tenant management and model migrations. It abstracts multitenancy complexities through a unified, database-agnostic API compatible with GORM. The framework supports two primary multitenancy strategies: The following sections provide an overview of the package's features and usage instructions for implementing multitenancy in Go applications. The package supports multitenancy for both PostgreSQL and MySQL databases, offering three methods for establishing a new database connection with multitenancy support: OpenDB allows opening a database connection using a URL-like DSN string, providing a flexible and easy way to switch between drivers. This method abstracts the underlying driver mechanics, offering a straightforward connection process and a unified, database-agnostic API through the returned *DB instance, which embeds the gorm.DB instance. For constructing the DSN string, refer to the driver-specific documentation for the required parameters and formats. This approach is useful for applications that need to dynamically switch between different database drivers or configurations without changing the codebase. Postgres: MySQL: Open with a supported driver offers a unified, database-agnostic API for managing tenant-specific and shared data, embedding the gorm.DB instance. This method allows developers to initialize and configure the dialect themselves before opening the database connection, providing granular control over the database connection and configuration. It facilitates seamless switching between database drivers while maintaining access to GORM's full functionality. Postgres: MySQL: For direct access to the gorm.DB API and multitenancy features for specific tasks, this approach allows invoking driver-specific functions directly. It provides a lower level of abstraction, requiring manual management of database-specific operations. Prior to version 8, this was the only method available for using the package. Postgres: MySQL: All models must implement driver.TenantTabler, which extends GORM's Tabler interface. This extension allows models to define their table name and indicate whether they are shared across tenants. Public Models: These are models which are shared across tenants. driver.TenantTabler.TableName should return the table name prefixed with 'public.'. driver.TenantTabler.IsSharedModel should return true. Tenant-Specific Models: These models are specific to a single tenant and should not be shared across tenants. driver.TenantTabler.TableName should return the table name without any prefix. driver.TenantTabler.IsSharedModel should return false. Tenant Model: This package provides a TenantModel struct that can be embedded in any public model that requires tenant scoping, enriching it with essential fields for managing tenant-specific information. This structure incorporates fields for the tenant's domain URL and schema name, facilitating the linkage of tenant-specific models to their respective schemas. Before performing any migrations or operations on tenant-specific models, the models must be registered with the DB instance using DB.RegisterModels. Postgres Adapter: Use postgres.RegisterModels to register models. MySQL Adapter: Use mysql.RegisterModels to register models. To ensure data integrity and schema isolation across tenants,gorm.DB.AutoMigrate has been disabled. Instead, use the provided shared and tenant-specific migration methods. driver.ErrInvalidMigration is returned if the `AutoMigrate` method is called directly. To ensure tenant isolation and facilitate concurrent migrations, driver-specific locking mechanisms are used. These locks prevent concurrent migrations from interfering with each other, ensuring that only one migration process can run at a time for a given tenant. Consult the driver-specific documentation for more information. After registering models, shared models are migrated using DB.MigrateSharedModels. Postgres Adapter: Use postgres.MigrateSharedModels to migrate shared models. MySQL Adapter: Use mysql.MigrateSharedModels to migrate shared models. After registering models, tenant-specific models are migrated using DB.MigrateTenantModels. Postgres Adapter: Use postgres.MigrateTenantModels to migrate tenant-specific models. MySQL Adapter: Use mysql.MigrateTenantModels to migrate tenant-specific models. When a tenant is removed from the system, the tenant-specific schema and associated tables should be cleaned up using DB.OffboardTenant. Postgres Adapter: Use postgres.DropSchemaForTenant to offboard a tenant. MySQL Adapter: Use mysql.DropDatabaseForTenant to offboard a tenant. DB.UseTenant configures the database for operations specific to a tenant, abstracting database-specific operations for tenant context configuration. This method returns a reset function to revert the database context and an error if the operation fails. Postgres Adapter: Use postgres.SetSearchPath to set the search path for a tenant. MySQL Adapter: Use mysql.UseDatabase function to set the database for a tenant. For the most part, foreign key constraints work as expected, but there are some restrictions and considerations to keep in mind when working with multitenancy. The following guidelines outline the supported relationships between tables: Between Tables in the Public Schema: From Public Schema Tables to Tenant-Specific Tables: From Tenant-Specific Tables to Public Schema Tables: Between Tenant-Specific Tables: See the example application for a more comprehensive demonstration of the framework's capabilities. Always sanitize input to prevent SQL injection vulnerabilities. This framework does not perform any validation on the database name or schema name parameters. It is the responsibility of the caller to ensure that these parameters are sanitized. To facilitate this, the framework provides the following utilities: For a detailed technical overview of SQL design strategies adopted by the framework, see the STRATEGY.md file.
Package ed implements an event dispatcher that is meant to allow codebases to organize concerns around events. Instead of placing all concerns about a particular business logic event in one function, this package allows you to declare events by type and dispatach those events to other code so that other concerns may be maintained in separate areas of the codebase. This module is tagged as v0, thus complies with Go's definition and rules about v0 modules (https://go.dev/doc/modules/version-numbers#v0-number). In short, it means that the API of this module may change without incrementing the major version number. Each releasable version will simply increment the patch number. Given the surface area of this module is quite small, this should not be a huge issue if used in production code. ed uses Go's own type system to dispatch events to the correct event handlers. By simply registering an event handler that accepts an event of a particular type: That handler will be triggered when an event of the same type is dispatched from somewhere else in the program: Because it is Go's type system, event handlers can also be registered against interfaces instead of just concrete types. Thus, creating an event handler that will be triggered for all events is as simple as: Overall, this package's goal is to mostly be an aid in allowing application business logic to remain clear of ancillary conerns/activities that might otherwise pile up in certain code areas. For example, as a SaaS company grows, a lot of cross-cutting concerns can build up around a customer signing up for a service. In a monolithic codebase/service, the code that handles a new customer signup would become full of "do this check", "send this data to marketing systems", "screen this signup for abusive behavior" logic. This is where ed is meant to be: taking all of those concerns/side-effects that are ancillary to the core act of signing up a user, and placing them in there own code areas and allowing them to be communicated with via events. This package is not trying to be a job queue, or a messaging system. The dispatching of events is done synchronously and the events are designed to allow errors to be returned so that use-cases like synchronous validation are supported. That being said, a common use-case might be to use this package to allow the publishing of events/messages to a Kafka/SQS-like system, an event handler: Then in your main business logic for user signups:
package bbolt implements a low-level key/value store in pure Go. It supports fully serializable transactions, ACID semantics, and lock-free MVCC with multiple readers and a single writer. Bolt can be used for projects that want a simple data store without the need to add large dependencies such as Postgres or MySQL. Bolt is a single-level, zero-copy, B+tree data store. This means that Bolt is optimized for fast read access and does not require recovery in the event of a system crash. Transactions which have not finished committing will simply be rolled back in the event of a crash. The design of Bolt is based on Howard Chu's LMDB database project. Bolt currently works on Windows, Mac OS X, and Linux. There are only a few types in Bolt: DB, Bucket, Tx, and Cursor. The DB is a collection of buckets and is represented by a single file on disk. A bucket is a collection of unique keys that are associated with values. Transactions provide either read-only or read-write access to the database. Read-only transactions can retrieve key/value pairs and can use Cursors to iterate over the dataset sequentially. Read-write transactions can create and delete buckets and can insert and remove keys. Only one read-write transaction is allowed at a time. The database uses a read-only, memory-mapped data file to ensure that applications cannot corrupt the database, however, this means that keys and values returned from Bolt cannot be changed. Writing to a read-only byte slice will cause Go to panic. Keys and values retrieved from the database are only valid for the life of the transaction. When used outside the transaction, these byte slices can point to different data or can point to invalid memory which will cause a panic.
Gaby is an experimental new bot running in the Go issue tracker as @gabyhelp, to try to help automate various mundane things that a machine can do reasonably well, as well as to try to discover new things that a machine can do reasonably well. The name gaby is short for “Go AI Bot”, because one of the purposes of the experiment is to learn what LLMs can be used for effectively, including identifying what they should not be used for. Some of the gaby functionality will involve LLMs; other functionality will not. The guiding principle is to create something that helps maintainers and that maintainers like, which means to use LLMs when they make sense and help but not when they don't. In the long term, the intention is for this code base or a successor version to take over the current functionality of “gopherbot” and become @gopherbot, at which point the @gabyhelp account will be retired. At the moment we are not accepting new code contributions or PRs. We hope to move this code to somewhere more official soon, at which point we will accept contributions. The GitHub Discussion is a good place to leave feedback about @gabyhelp. The bot functionality is implemented in internal packages in subdirectories. This comment gives a brief tour of the structure. An explicit goal for the Gaby code base is that it run well in many different environments, ranging from a maintainer's home server or even Raspberry Pi all the way up to a hosted cloud. (At the moment, Gaby runs on a Linux server in my basement.) Due to this emphasis on portability, Gaby defines its own interfaces for all the functionality it needs from the surrounding environment and then also defines a variety of implementations of those interfaces. Another explicit goal for the Gaby code base is that it be very well tested. (See my [Go Testing talk] for more about why this is so important.) Abstracting the various external functionality into interfaces also helps make testing easier, and some packages also provide explicit testing support. The result of both these goals is that Gaby defines some basic functionality like time-ordered indexing for itself instead of relying on some specific other implementation. In the grand scheme of things, these are a small amount of code to maintain, and the benefits to both portability and testability are significant. Code interacting with services like GitHub and code running on cloud servers is typically difficult to test and therefore undertested. It is an explicit requirement this repo to test all the code, even (and especially) when testing is difficult. A useful command to have available when working in the code is rsc.io/uncover, which prints the package source lines not covered by a unit test. A useful invocation is: The first “go test” command checks that the test passes. The second repeats the test with coverage enabled. Running the test twice this way makes sure that any syntax or type errors reported by the compiler are reported without coverage, because coverage can mangle the error output. After both tests pass and second writes a coverage profile, running “uncover /tmp/c.out” prints the uncovered lines. In this output, there are three error paths that are untested. In general, error paths should be tested, so tests should be written to cover these lines of code. In limited cases, it may not be practical to test a certain section, such as when code is unreachable but left in case of future changes or mistaken assumptions. That part of the code can be labeled with a comment beginning “// Unreachable” or “// unreachable” (usually with explanatory text following), and then uncover will not report it. If a code section should be tested but the test is being deferred to later, that section can be labeled “// Untested” or “// untested” instead. The rsc.io/gaby/internal/testutil package provides a few other testing helpers. The overview of the code now proceeds from bottom up, starting with storage and working up to the actual bot. Gaby needs to manage a few secret keys used to access services. The rsc.io/gaby/internal/secret package defines the interface for obtaining those secrets. The only implementations at the moment are an in-memory map and a disk-based implementation that reads $HOME/.netrc. Future implementations may include other file formats as well as cloud-based secret storage services. Secret storage is intentionally separated from the main database storage, described below. The main database should hold public data, not secrets. Gaby defines the interface it expects from a large language model. The llm.Embedder interface abstracts an LLM that can take a collection of documents and return their vector embeddings, each of type llm.Vector. The only real implementation to date is rsc.io/gaby/internal/gemini. It would be good to add an offline implementation using Ollama as well. For tests that need an embedder but don't care about the quality of the embeddings, llm.QuoteEmbedder copies a prefix of the text into the vector (preserving vector unit length) in a deterministic way. This is good enough for testing functionality like vector search and simplifies tests by avoiding a dependence on a real LLM. At the moment, only the embedding interface is defined. In the future we expect to add more interfaces around text generation and tool use. As noted above, Gaby defines interfaces for all the functionality it needs from its external environment, to admit a wide variety of implementations for both execution and testing. The lowest level interface is storage, defined in rsc.io/gaby/internal/storage. Gaby requires a key-value store that supports ordered traversal of key ranges and atomic batch writes up to a modest size limit (at least a few megabytes). The basic interface is storage.DB. storage.MemDB returns an in-memory implementation useful for testing. Other implementations can be put through their paces using storage.TestDB. The only real storage.DB implementation is rsc.io/gaby/internal/pebble, which is a LevelDB-derived on-disk key-value store developed and used as part of CockroachDB. It is a production-quality local storage implementation and maintains the database as a directory of files. In the future we plan to add an implementation using Google Cloud Firestore, which provides a production-quality key-value lookup as a Cloud service without fixed baseline server costs. (Firestore is the successor to Google Cloud Datastore.) The storage.DB makes the simplifying assumption that storage never fails, or rather that if storage has failed then you'd rather crash your program than try to proceed through typically untested code paths. As such, methods like Get and Set do not return errors. They panic on failure, and clients of a DB can call the DB's Panic method to invoke the same kind of panic if they notice any corruption. It remains to be seen whether this decision is kept. In addition to the usual methods like Get, Set, and Delete, storage.DB defines Lock and Unlock methods that acquire and release named mutexes managed by the database layer. The purpose of these methods is to enable coordination when multiple instances of a Gaby program are running on a serverless cloud execution platform. So far Gaby has only run on an underground basement server (the opposite of cloud), so these have not been exercised much and the APIs may change. In addition to the regular database, package storage also defines storage.VectorDB, a vector database for use with LLM embeddings. The basic operations are Set, Get, and Search. storage.MemVectorDB returns an in-memory implementation that stores the actual vectors in a storage.DB for persistence but also keeps a copy in memory and searches by comparing against all the vectors. When backed by a storage.MemDB, this implementation is useful for testing, but when backed by a persistent database, the implementation suffices for small-scale production use (say, up to a million documents, which would require 3 GB of vectors). It is possible that the package ordering here is wrong and that VectorDB should be defined in the llm package, built on top of storage, and not the current “storage builds on llm”. Because Gaby makes minimal demands of its storage layer, any structure we want to impose must be implemented on top of it. Gaby uses the rsc.io/ordered encoding format to produce database keys that order in useful ways. For example, ordered.Encode("issue", 123) < ordered.Encode("issue", 1001), so that keys of this form can be used to scan through issues in numeric order. In contrast, using something like fmt.Sprintf("issue%d", n) would visit issue 1001 before issue 123 because "1001" < "123". Using this kind of encoding is common when using NoSQL key-value storage. See the rsc.io/ordered package for the details of the specific encoding. One of the implied jobs Gaby has is to collect all the relevant information about an open source project: its issues, its code changes, its documentation, and so on. Those sources are always changing, so derived operations like adding embeddings for documents need to be able to identify what is new and what has been processed already. To enable this, Gaby implements time-stamped—or just “timed”—storage, in which a collection of key-value pairs also has a “by time” index of ((timestamp, key), no-value) pairs to make it possible to scan only the key-value pairs modified after the previous scan. This kind of incremental scan only has to remember the last timestamp processed and then start an ordered key range scan just after that timestamp. This convention is implemented by rsc.io/gaby/internal/timed, along with a [timed.Watcher] that formalizes the incremental scan pattern. Various package take care of downloading state from issue trackers and the like, but then all that state needs to be unified into a common document format that can be indexed and searched. That document format is defined by rsc.io/gaby/internal/docs. A document consists of an ID (conventionally a URL), a document title, and document text. Documents are stored using timed storage, enabling incremental processing of newly added documents . The next stop for any new document is embedding it into a vector and storing that vector in a vector database. The rsc.io/gaby/internal/embeddocs package does this, and there is very little to it, given the abstractions of a document store with incremental scanning, an LLM embedder, and a vector database, all of which are provided by other packages. None of the packages mentioned so far involve network operations, but the next few do. It is important to test those but also equally important not to depend on external network services in the tests. Instead, the package rsc.io/gaby/internal/httprr provides an HTTP record/replay system specifically designed to help testing. It can be run once in a mode that does use external network servers and records the HTTP exchanges, but by default tests look up the expected responses in the previously recorded log, replaying those responses. The result is that code making HTTP request can be tested with real server traffic once and then re-tested with recordings of that traffic afterward. This avoids having to write entire fakes of services but also avoids needing the services to stay available in order for tests to pass. It also typically makes the tests much faster than using the real servers. Gaby uses GitHub in two main ways. First, it downloads an entire copy of the issue tracker state, with incremental updates, into timed storage. Second, it performs actions in the issue tracker, like editing issues or comments, applying labels, or posting new comments. These operations are provided by rsc.io/gaby/internal/github. Gaby downloads the issue tracker state using GitHub's REST API, which makes incremental updating very easy but does not provide access to a few newer features such as project boards and discussions, which are only available in the GraphQL API. Sync'ing using the GraphQL API is left for future work: there is enough data available from the REST API that for now we can focus on what to do with that data and not that a few newer GitHub features are missing. The github package provides two important aids for testing. For issue tracker state, it also allows loading issue data from a simple text-based issue description, avoiding any actual GitHub use at all and making it easier to modify the test data. For issue tracker actions, the github package defaults in tests to not actually making changes, instead diverting edits into an in-memory log. Tests can then check the log to see whether the right edits were requested. The rsc.io/gaby/internal/githubdocs package takes care of adding content from the downloaded GitHub state into the general document store. Currently the only GitHub-derived documents are one document per issue, consisting of the issue title and body. It may be worth experimenting with incorporating issue comments in some way, although they bring with them a significant amount of potential noise. Gaby will need to download and store Gerrit state into the database and then derive documents from it. That code has not yet been written, although rsc.io/gerrit/reviewdb provides a basic version that can be adapted. Gaby will also need to download and store project documentation into the database and derive documents from it corresponding to cutting the page at each heading. That code has been written but is not yet tested well enough to commit. It will be added later. The simplest job Gaby has is to go around fixing new comments, including issue descriptions (which look like comments but are a different kind of GitHub data). The rsc.io/gaby/internal/commentfix package implements this, watching GitHub state incrementally and applying a few kinds of rewrite rules to each new comment or issue body. The commentfix package allows automatically editing text, automatically editing URLs, and automatically hyperlinking text. The next job Gaby has is to respond to new issues with related issues and documents. The rsc.io/gaby/internal/related package implements this, watching GitHub state incrementally for new issues, filtering out ones that should be ignored, and then finding related issues and documents and posting a list. This package was originally intended to identify and automatically close duplicates, but the difference between a duplicate and a very similar or not-quite-fixed issue is too difficult a judgement to make for an LLM. Even so, the act of bringing forward related context that may have been forgotten or never known by the people reading the issue has turned out to be incredibly helpful. All of these pieces are put together in the main program, this package, rsc.io/gaby. The actual main package has no tests yet but is also incredibly straightforward. It does need tests, but we also need to identify ways that the hard-coded policies in the package can be lifted out into data that a natural language interface can manipulate. For example the current policy choices in package main amount to: These could be stored somewhere as data and manipulated and added to by the LLM in response to prompts from maintainers. And other features could be added and configured in a similar way. Exactly how to do this is an important thing to learn in future experimentation. As mentioned above, the two jobs Gaby does already are both fairly simple and straightforward. It seems like a general approach that should work well is well-written, well-tested deterministic traditional functionality such as the comment fixer and related-docs poster, configured by LLMs in response to specific directions or eventually higher-level goals specified by project maintainers. Other functionality that is worth exploring is rules for automatically labeling issues, rules for identifying issues or CLs that need to be pinged, rules for identifying CLs that need maintainer attention or that need submitting, and so on. Another stretch goal might be to identify when an issue needs more information and ask for that information. Of course, it would be very important not to ask for information that is already present or irrelevant, so getting that right would be a very high bar. There is no guarantee that today's LLMs work well enough to build a useful version of that. Another important area of future work will be running Gaby on top of cloud databases and then moving Gaby's own execution into the cloud. Getting it a server with a URL will enable GitHub callbacks instead of the current 2-minute polling loop, which will enable interactive conversations with Gaby. Overall, we believe that there are a few good ideas for ways that LLM-based bots can help make project maintainers' jobs easier and less monotonous, and they are waiting to be found. There are also many bad ideas, and they must be filtered out. Understanding the difference will take significant care, thought, and experimentation. We have work to do.
Package Authaus is an authentication and authorization system. Authaus brings together the following pluggable components: Any of these five components can be swapped out, and in fact the fourth, and fifth ones (Role Groups and User Store) are entirely optional. A typical setup is to use LDAP as an Authenticator, and Postgres as a Session, Permit, and Role Groups database. Your session database does not need to be particularly performant, since Authaus maintains an in-process cache of session keys and their associated tokens. Authaus was NOT designed to be a "Facebook Scale" system. The target audience is a system of perhaps 100,000 users. There is nothing fundamentally limiting about the API of Authaus, but the internals certainly have not been built with millions of users in mind. The intended usage model is this: Authaus is intended to be embedded inside your security system, and run as a standalone HTTP service (aka a REST service). This HTTP service CAN be open to the wide world, but it's also completely OK to let it listen only to servers inside your DMZ. Authaus only gives you the skeleton and some examples of HTTP responders. It's up to you to flesh out the details of your authentication HTTP interface, and whether you'd like that to face the world, or whether it should only be accessible via other services that you control. At startup, your services open an HTTP connection to the Authaus service. This connection will typically live for the duration of the service. For every incoming request, you peel off whatever authentication information is associated with that request. This is either a session key, or a username/password combination. Let's call it the authorization information. You then ask Authaus to tell you WHO this authorization information belongs to, as well as WHAT this authorization information allows the requester to do (ie Authentication and Authorization). Authaus responds either with a 401 (Unauthorized), 403 (Forbidden), or a 200 (OK) and a JSON object that tells you the identity of the agent submitting this request, as well the permissions that this agent posesses. It's up to your individual services to decide what to do with that information. It should be very easy to expose Authaus over a protocol other than HTTP, since Authaus is intended to be easy to embed. The HTTP API is merely an illustrative example. A `Session Key` is the long random number that is typically stored as a cookie. A `Permit` is a set of roles that has been granted to a user. Authaus knows nothing about the contents of a permit. It simply treats it as a binary blob, and when writing it to an SQL database, encodes it as base64. The interpretation of the permit is application dependent. Typically, a Permit will hold information such as "Allowed to view billing information", or "Allowed to paint your bathroom yellow". Authaus does have a built-in module called RoleGroupDB, which has its own interpretation of what a Permit is, but you do not need to use this. A `Token` is the result of a successful authentication. It stores the identity of a user, an expiry date, and a Permit. A token will usually be retrieved by a session key. However, you can also perform a once-off authentication, which also yields you a token, which you will typically throw away when you are finished with it. All public methods of the `Central` object are callable from multiple threads. Reader-Writer locks are used in all of the caching systems. The number of concurrent connections is limited only by the limits of the Go runtime, and the performance limits that are inherent to the simple reader-writer locks used to protect shared state. Authaus must be deployed as a single process (which implies running on a single logical machine). The sole reason why it must run on only one process and not more, is because of the state that lives inside the various Authaus caches. Were it not for these caches, then there would be nothing preventing you from running Authaus on as many machines as necessary. The cached state stored inside the Authaus server is: If you wanted to make Authaus runnable across multiple processes, then you would need to implement a cache invalidation system for these caches. Authaus makes no attempt to mitigate DOS attacks. The most sane approach in this domain seems to be this (http://security.stackexchange.com/questions/12101/prevent-denial-of-service-attacks-against-slow-hashing-functions). The password database (created via NewAuthenticationDB_SQL) stores password hashes using the scrypt key derivation system (http://www.tarsnap.com/scrypt.html). Internally, we store our hash in a format that can later be extended, should we wish to double-hash the passwords, etc. The hash is 65 bytes and looks like this: The first byte of the hash is a version number of the hash. The remaining 64 bytes are the salt and the hash itself. At present, only one version is supported, which is version 1. It consists of 32 bytes of salt, and 32 bytes of scrypt'ed hash, with scrypt parameters N=256 r=8 p=1. Note that the parameter N=256 is quite low, meaning that it is possible to compute this in approximately 1 millisecond (1,000,000 nanoseconds) on a 2009-era Intel Core i7. This is a deliberate tradeoff. On the same CPU, a SHA256 hash takes about 500 nanoseconds to compute, so we are still making it 2000 times harder to brute force the passwords than an equivalent system storing only a SHA256 salted hash. This discussion is only of relevance in the event that the password table is compromised. No cookie signing mechanism is implemented. Cookies are not presently transmitted with Secure:true. This must change. The LDAP Authenticator is extremely simple, and provides only one function: Authenticate a user against an LDAP system (often this means Active Directory, AKA a Windows Domain). It calls the LDAP "Bind" method, and if that succeeds for the given identity/password, then the user is considered authenticated. We take care not to allow an "anonymous bind", which many LDAP servers allow when the password is blank. The Session Database runs on Postgres. It stores a table of sessions, where each row contains the following information: When a permit is altered with Authaus, then all existing sessions have their permits altered transparently. For example, imagine User X is logged in, and his administrator grants him a new permission. User X does not need to log out and log back in again in order for his new permissions to be reflected. His new permissions will be available immediately. Similarly, if a password is changed with Authaus, then all sessions are invalidated. Do take note though, that if a password is changed through an external mechanism (such as with LDAP), then Authaus will have no way of knowing this, and will continue to serve up sessions that were authenticated with the old password. This is a problem that needs addressing. You can limit the number of concurrent sessions per user to 1, by setting MaxActiveSessions.ConfigSessionDB to 1. This setting may only be zero or one. Zero, which is the default, means an unlimited number of concurrent sessions per user. Authaus will always place your Session Database behind its own Session Cache. This session cache is a very simple single-process in-memory cache of recent sessions. The limit on the number of entries in this cache is hard-coded, and that should probably change. The Permit database runs on Postgres. It stores a table of permits, which is simply a 1:1 mapping from Identity -> Permit. The Permit is just an array of bytes, which we store base64 encoded, inside a text field. This part of the system doesn't care how you interpret that blob. The Role Group Database is an entirely optional component of Authaus. The other components of Authaus (Authenticator, PermitDB, SessionDB) do not understand your Permits. To them, a Permit is simply an arbitrary array of bytes. The Role Group Database is a component that adds a specific meaning to a permit blob. Let's see what that specific meaning looks like... The built-in Role Group Database interprets a permit blob as a string of 32-bit integer IDs: These 32-bit integer IDs refer to "role groups" inside a database table. The "role groups" table might look like this: The Role Group IDs use 32-bit indices, because we assume that you are not going to create more than 2^32 different role groups. The worst case we assume here is that of an automated system that creates 100,000 roles per day. Such a system would run for more than 100 years, given a 32-bit ID. These constraints are extraordinary, suggesting that we do not even need 32 bits, but could even get away with just a 16-bit group ID. However, we expect the number of groups to be relatively small. Our aim here, arbitrary though it may be, is to fit the permit and identity into a single ethernet packet, which one can reasonably peg at 1500 bytes. 1500 / 4 = 375. We assume that no sane human administrator will assign 375 security groups to any individual. We expect the number of groups assigned to any individual to be in the range of 1 to 20. This makes 375 a gigantic buffer. OAuth support in Authaus is limited to a very simple scenario: * You wish to allow your users to login using an OAuth service - thereby outsourcing the Authentication to that external service, and using it to populate the email address of your users. OAuth was developed in order to work with Microsoft Azure Active Directory, however it should be fairly easy to extend the code to be able to handle other OAuth providers. Inside the database are two tables related to OAuth: oauthchallenge: The challenge table holds OAuth sessions which have been started, and which are expected to either succeed or fail within the next few minutes. The default timeout for a challenge is 5 minutes. A challenge record is usually created the moment the user clicks on the "Sign in with Microsoft" button on your site, and it tracks that authentication attempt. oauthsession: The session table holds OAuth sessions which have successfully authenticated, and also the token that was retrieved by a successful authorization. If a token has expired, then it is refreshed and updated in-place, inside the oauthsession table. An OAuth login follows this sequence of events: 1. User clicks on a "Signin with X" button on your login page 2. A record is created in the oauthchallenge table, with a unique ID. This ID is a secret known only to the authaus server and the OAuth server. It is used as the `state` parameter in the OAuth login mechanism. 3. The HTTP call which prompts #2 return a redirect URL (eg via an HTTP 302 response), which redirects the user's browser to the OAuth website, so that the user can either grant or refuse access. If the user refuses, or fails to login, then the login sequence ends here. 4. Upon successful authorization with the OAuth system, the OAuth website redirects the user back to your website, to a URL such as example.com/auth/oauth/finish, and you'll typically want Authaus to handle this request directly (via HttpHandlerOAuthFinish). Authaus will extract the secrets from the URL, perform any validations necessary, and then move the record from the oauthchallenge table, into the oauthsession table. While 'moving' the record over, it will also add any additional information that was provided by the successful authentication, such as the token provided by the OAuth provider. 5. Authaus makes an API call to the OAuth system, to retrieve the email address and name of the person that just logged in, using the token just received. 6. If that email address does not exist inside authuserstore, then create a new user record for this identity. 7. Log the user into Authaus, by creating a record inside authsession, for the relevant identity. Inside the authsession table, store a link to the oauthsession record, so that there is a 1:1 link from the authsession table, to the oauthsession table (ie Authaus Session to OAuth Token). 8. Return an Authaus session cookie to the browser, thereby completing the login. Although we only use our OAuth token a single time, during login, to retrieve the user's email address and name, we retain the OAuth token, and so we maintain the ability to make other API calls on behalf of that user. This hasn't proven necessary yet, but it seems like a reasonable bit of future-proofing. See the guidelines at the top of all_test.go for testing instructions.
Package raft provides a simple, easy-to-understand, and reliable implementation of Raft. Raft is a consensus protocol designed to manage replicated logs in a distributed system. Its purpose is to ensure fault-tolerant coordination and consistency among a group of nodes, making it suitable for building reliable systems. Potential use cases include distributed file systems, consistent key-value stores, and service discovery.
Package metrics is a telemetry client designed for Uber's software networking team. It prioritizes performance on the hot path and integration with both push- and pull-based collection systems. Like Prometheus and Tally, it supports metrics tagged with arbitrary key-value pairs. Like Prometheus, but unlike Tally, metric names should be relatively long and descriptive - generally speaking, metrics from the same process shouldn't share names. (See the documentation for the Root struct below for a longer explanation of the uniqueness rules.) For example, prefer "grpc_successes_by_procedure" over "successes", since "successes" is common and vague. Where relevant, metric names should indicate their unit of measurement (e.g., "grpc_success_latency_ms"). Counters represent monotonically increasing values, like a car's odometer. Gauges represent point-in-time readings, like a car's speedometer. Both counters and gauges expose not only write operations (set, add, increment, etc.), but also atomic reads. This makes them easy to integrate directly into your business logic: you can use them anywhere you'd otherwise use a 64-bit atomic integer. This package doesn't support analogs of Tally's timer or Prometheus's summary, because they can't be accurately aggregated at query time. Instead, it approximates distributions of values with histograms. These require more up-front work to set up, but are typically more accurate and flexible when queried. See https://prometheus.io/docs/practices/histograms/ for a more detailed discussion of the trade-offs involved. Plain counters, gauges, and histograms have a fixed set of tags. However, it's common to encounter situations where a subset of a metric's tags vary constantly. For example, you might want to track the latency of your database queries by table: you know the database cluster, application name, and hostname at process startup, but you need to specify the table name with each query. To model these situations, this package uses vectors. Each vector is a local cache of metrics, so accessing them is quite fast. Within a vector, all metrics share a common set of constant tags and a list of variable tags. In our database query example, the constant tags are cluster, application, and hostname, and the only variable tag is table name. Usage examples are included in the documentation for each vector type. This package integrates with StatsD- and M3-based collection systems by periodically pushing differential updates. (Users can integrate with other push-based systems by implementing the push.Target interface.) It integrates with pull-based collectors by exposing an HTTP handler that supports Prometheus's text and protocol buffer exposition formats. Examples of both push and pull integration are included in the documentation for the root struct's Push and ServeHTTP methods. If you're unfamiliar with Tally and Prometheus, you may want to consult their documentation:
Package skogul is a framework for receiving, processing and forwarding data, typically metric data or event-oriented data, at high throughput. It is designed to be as agnostic as possible with regards to how it transmits data and how it receives it, and the processors in between need not worry about how the data got there or how it will be treated in the next chain. This means you can use Skogul to receive data on a influxdb-like line-based TCP interface and send it on to postgres - or influxdb - without having to write explicit support, just set up the chain. The guiding principles of Skogul is: - Make as few assumptions as possible about how data is received - Be stupid fast End users should only need to worry about the cmd/skogul tool, which comes fully equipped with self-contained documentation. Adding new logic to Skogul should also be fairly easy. New developers should focus on understanding two things: 1. The skogul.Container data structure - which is the heart of Skogul. 2. The relationship from receiver to handler to sender. The Container is documented in this very package. Receivers are where data originates within Skogul. The typical Receiver will receive data from the outside world, e.g. by other tools posting data to a HTTP endpoint. Receivers can also be used to "create" data, either test data or, for example, log data. When skogul starts, it will start all receivers that are configured. Handlers determine what is done with the data once received. They are responsible for parsing raw data and optionally transform it. This is the only place where it is allowed to _modify_ data. Today, the only transformer is the "templater", which allows a collection of metrics which share certain attributes (e.g.: all collected at the same time and from the same machine) to provide these shared attributes in a template which the "templater" transformer then applies to all metrics. Other examples of transformations that make sense are: - Adding a metadata field - Removing a metadata field - Removing all but a specific set of fields - Converting nested metrics to multiple metrics or flatten them Once a handler has done its deed, it sends the Container to the sender, and this is where "the fun begins" so to speak. Senders consist of just a data structure that implements the Send() interface. They are not allowed to change the container, but besides that, they can do "whatever". The most obvious example is to send the container to a suitable storage system - e.g., a time series database. So if you want to add support for a new time series database in Skogul, you will write a sender. In addition to that, many senders serve only to add internal logic and pass data on to other senders. Each sender should only do one specific thing. For example, if you want to write data both to InfluxDB and MySQL, you need three senders: The "MySQL" and "InfluxDB" senders, and the "dupe" sender, which just takes a list of other senders and sends whatever it receives on to all of them. Today, Senders and Receivers both have an identical "Auto"-system, found in auto.go of the relevant directories. This is how the individual implementations are made discoverable to the configuration system, and how documentation is provided. Documentation for the settings of a sender/receiver is handled as struct tags. Once more parsers/transformers are added, they will likely also use a similar system.
The bookpipeline package contains various tools and functions for the OCR of books, with a focus on distributed OCR using short-lived virtual servers. It also contains several tools that are useful standalone; read the accompanying README for more details. The book pipeline is a way to split the different processes that for book OCR into small jobs, which can be processed when a computer is ready for them. It is currently implemented with Amazon's AWS cloud systems, and can scale from zero to many computers, with jobs being processed faster when more servers are available. Central to the bookpipeline in terms of software is the bookpipeline command, which is part of the rescribe.xyz/bookpipeline package. Presuming you have the go tools installed, you can install it, and useful tools to control the system, with this command: All of the tools provided in the bookpipeline package will give information on what they do and how they work with the '-h' flag, so for example to get usage information on the booktopipeline tool simply run the following: To get the pipeline tools to work for you, you'll need to change the settings in cloudsettings.go, and set up your ~/.aws/credentials appropriately. Most of the time the bookpipeline is expected to be run from potentially short-lived servers on Amazon's EC2 system. EC2 provides servers which have no guaranteed of stability (though in practice they seem to be), called "Spot Instances", which we use for bookpipeline. bookpipeline can handle a process or server being suddenly destroyed without warning (more on this later), so Spot Instances are perfect for us. We have set up a machine image with bookpipeline preinstalled which will launch at bootup, which is all that's needed to launch an bookpipeline instance. Presuming the bookpipeline package has been installed on your computer (see above), the spot instance can be started with the command: You can keep an eye on the servers (spot or otherwise) that are running, and the jobs left to do and in progress, with the "lspipeline" tool (which is also part of the bookpipeline package). It's recommended to use this with the ssh private key for the servers, so that it can also report on what each server is currently doing, but it can run successfully without it. It takes a little while to run, so be patient. It can be run with the command: Spot instances can be terminated with ssh, using their ip address which can be found with lspipeline, like so: The bookpipeline program is run as a service managed by systemd on the servers. The system is fully resiliant in the face of unexpected failures. See the section "How the pipeline works" for details on this. bookpipeline can be managed like any other systemd service. A few examples: Books can be added to the pipeline using the "booktopipeline" tool. This takes a directory of page images as input, and uploads them all to S3, adding a job to the pipeline queue to start processing them. So it can be used like this: Once a book has been finished, it can be downloaded using the "getpipelinebook" tool. This has several options to download specific parts of a book, but the default case will download the best hOCR for each page, PDFs, and the best, conf and graph.png files. Use it like this: To get the plain text from the book, use the hocrtotxt tool, which is part of the rescribe.xyz/utils package. You can get the package, and run the tool, like this: The central part of the book pipeline is several SQS queues, which contain jobs which need to be done by a server running bookpipeline. The exact content of the SQS messages vary according to each queue, as some jobs need more information than others. Each queue is checked at least once every couple of minutes on any server that isn't currently processing a job. When a job is taken from the queue by a process, it is hidden from the queue for 2 minutes so that no other process can take it. Once per minute when processing a job the process sends a message updating the queue, to tell it to keep the job hidden for two minutes. This is called the "heartbeat", as if the process fails for any reason the heartbeat will stop, and in 2 minutes the job will reappear on the queue for another process to have a go at. Once a job is completed successfully it is deleted from the queue. Queue names are defined in cloudsettings.go. queuePreProc Each message in the queuePreProc queue is a bookname, optionally followed by a space and the name of the training to use. Each page of the bookname will be binarised with several different parameters, and then wiped, with each version uploaded to S3, with the path of the preprocessed page, plus the training name if it was provided, will be added to the queueOcrPage queue. The pages are binarised with different parameters as it can be difficult to determine which binarisation level will be best prior to OCR, so several different options are used, and in the queueAnalyse step the best one is chosen, based on the confidence of the OCR output. queueWipeOnly This queue works the same as queuePreProc, except that it doesn't binarise the pages, only runs the wiper. Hence it is designed for books which have been prebinarised. queuePreNoWipe This queue works the same as queuePreProc, except that it doesn'T wipe the pages, only runs the binarisation. It is designed for books which don't have tricky gutters or similar noise around the edges, but do have marginal content which might be inadventently removed by the wiper. queueOcrPage This queue contains the path of individual pages, optionally followed by a space and the name of the training to use. Each page is OCRed, and the results are uploaded to S3. After each page is OCRed, a check is made to see whether all pages that look like they were preprocessed have corresponding .hocr files. If so, the bookname is added to the queueAnalyse queue. queueAnalyse A message on the queueAnalyse queue contains only a book name. The confidences for each page are calculated and saved in the 'conf' file, and the best version of each page is decided upon and saved in the 'best' file. PDFs are then generated, and the confidence graph is generated. The queues should generally only be messed with by the bookpipeline and booktopipeline tools, but if you're feeling ambitious you can take a look at the `addtoqueue` tool. Remember that messages in a queue are hidden for a few minutes when they are read, so for example you couldn't straightforwardly delete a message which was currently being processed by a server, as you wouldn't be able to see it. At present the bookpipeline has some silly limitations of file names for book pages to be recognised. This is something which will be fixed in due course. While bookpipeline was built with cloud based operation in mind, there is also a local mode that can be used to run OCR jobs from a single computer, with all the benefits of preprocessing, choosing the best threshold for each image, graph creation, PDF creation, and so on that the pipeline provides. Several of the commands accept a `-c local` flag for local operation, but now there is also a new command, named rescribe, that is designed to make things much simpler for people just wanting to do some OCR on their local computer. Note that the local mode is not as well tested as the core cloud modes; please report any bugs you find with it.
Package kyber provides a toolbox of advanced cryptographic primitives, for applications that need more than straightforward signing and encryption. This top level package defines the interfaces to cryptographic primitives designed to be independent of specific cryptographic algorithms, to facilitate upgrading applications to new cryptographic algorithms or switching to alternative algorithms for experimentation purposes. This toolkits public-key crypto API includes a kyber.Group interface supporting a broad class of group-based public-key primitives including DSA-style integer residue groups and elliptic curve groups. Users of this API can write higher-level crypto algorithms such as zero-knowledge proofs without knowing or caring exactly what kind of group, let alone which precise security parameters or elliptic curves, are being used. The kyber.Group interface supports the standard algebraic operations on group elements and scalars that nontrivial public-key algorithms tend to rely on. The interface uses additive group terminology typical for elliptic curves, such that point addition is homomorphically equivalent to adding their (potentially secret) scalar multipliers. But the API and its operations apply equally well to DSA-style integer groups. As a trivial example, generating a public/private keypair is as simple as: The first statement picks a private key (Scalar) from a the suites's source of cryptographic random or pseudo-random bits, while the second performs elliptic curve scalar multiplication of the curve's standard base point (indicated by the 'nil' argument to Mul) by the scalar private key 'a'. Similarly, computing a Diffie-Hellman shared secret using Alice's private key 'a' and Bob's public key 'B' can be done via: Note that we use 'Mul' rather than 'Exp' here because the library uses the additive-group terminology common for elliptic curve crypto, rather than the multiplicative-group terminology of traditional integer groups - but the two are semantically equivalent and the interface itself works for both elliptic curve and integer groups. Various sub-packages provide several specific implementations of these cryptographic interfaces. In particular, the 'group/mod' sub-package provides implementations of modular integer groups underlying conventional DSA-style algorithms. The `group/nist` package provides NIST-standardized elliptic curves built on the Go crypto library. The 'group/edwards25519' sub-package provides the kyber.Group interface using the popular Ed25519 curve. Other sub-packages build more interesting high-level cryptographic tools atop these primitive interfaces, including: - share: Polynomial commitment and verifiable Shamir secret splitting for implementing verifiable 't-of-n' threshold cryptographic schemes. This can be used to encrypt a message so that any 2 out of 3 receivers must work together to decrypt it, for example. - proof: An implementation of the general Camenisch/Stadler framework for discrete logarithm knowledge proofs. This system supports both interactive and non-interactive proofs of a wide variety of statements such as, "I know the secret x associated with public key X or I know the secret y associated with public key Y", without revealing anything about either secret or even which branch of the "or" clause is true. - sign: The sign directory contains different signature schemes. - sign/anon provides anonymous and pseudonymous public-key encryption and signing, where the sender of a signed message or the receiver of an encrypted message is defined as an explicit anonymity set containing several public keys rather than just one. For example, a member of an organization's board of trustees might prove to be a member of the board without revealing which member she is. - sign/cosi provides collective signature algorithm, where a bunch of signers create a unique, compact and efficiently verifiable signature using the Schnorr signature as a basis. - sign/eddsa provides a kyber-native implementation of the EdDSA signature scheme. - sign/schnorr provides a basic vanilla Schnorr signature scheme implementation. - shuffle: Verifiable cryptographic shuffles of ElGamal ciphertexts, which can be used to implement (for example) voting or auction schemes that keep the sources of individual votes or bids private without anyone having to trust more than one of the shuffler(s) to shuffle votes/bids honestly. As should be obvious, this library is intended to be used by developers who are at least moderately knowledgeable about cryptography. If you want a crypto library that makes it easy to implement "basic crypto" functionality correctly - i.e., plain public-key encryption and signing - then [NaCl secretbox](https://godoc.org/golang.org/x/crypto/nacl/secretbox) may be a better choice. This toolkit's purpose is to make it possible - and preferably easy - to do slightly more interesting things that most current crypto libraries don't support effectively. The one existing crypto library that this toolkit is probably most comparable to is the Charm rapid prototyping library for Python (https://charm-crypto.com/category/charm). This library incorporates and/or builds on existing code from a variety of sources, as documented in the relevant sub-packages. This library is offered as-is, and without a guarantee. It will need an independent security review before it should be considered ready for use in security-critical applications. If you integrate Kyber into your application it is YOUR RESPONSIBILITY to arrange for that audit. If you notice a possible security problem, please report it to dedis-security@epfl.ch.
Package mspec is a BDD context/specification testing package for Go(Lang) with a strong emphases on spec'ing your feature(s) and scenarios first, before any code is written using as little syntax noise as possible. This leaves you free to think of your project and features as a whole without the distraction of writing any code with the added benefit of having tests ready for your project. [![GoDoc](https://godoc.org/github.com/ddspog/mspec?status.svg)](https://godoc.org/github.com/ddspog/mspec) holds the source documentation (where else?) * Uses natural language (Given/When/Then) * Stubbing * Human-readable outputs * HTML output (coming soon) * Use custom Assertions * Configuration options * Uses Testify's rich assertions * Uses Go's built-in testing.T package Install it with one line of code: There are no external dependencies and it is built against Go's internal packages. The only dependency is that you have [GOPATH setup normaly](https://golang.org/doc/code.html). Create a new file to hold your specs. Using Dan North's original BDD definitions, you spec code using the Given/When/Then storyline similar to: But this is just a static example. Let's take a real example from one of my projects: You represent these thoughts in code like this: Note that `Given`, `when` and `it` all have optional variadic parameters. This allows you to spec things out with as little or as far as you want. That's it. Now run it: Print it out and stick it on your office door for everyone to see what you are working on. This is actually colored output in Terminal: It is not uncommon to go back and tweak your stories over time as you talk with your domain experts, modifying exactly the scenarios and specifications that should happen. `GoMSpec` is a testing package for the Go framework that extends Go's built-in testing package. It is modeled after the BDD Feature Specification story workflow such as: Currently it has an included `Expectation` struct that mimics basic assertion behaviors. Future plans may allow for custom assertion packages (like testify). Getting it Importing it Writing Specs Testing it Which outputs the following: Nice eh? There is nothing like using a testing package to test itself. There is some nice rich information available. ## Examples Be sure to check out more examples in the examples/ folder. Or just open the files and take a look. That's the most important part anyways. When evaluating several BDD frameworks, [Pranavraja's Zen](https://github.com/pranavraja/zen) package for Go came close - really close; but, it was lacking the more "story" overview I've been accustomed to over the years with [Machine.Specifications](https://github.com/machine/machine.specifications) in C# (.NET land). Do note that there is something to be said for simple testing in Go (and simple coding); therefore, if you are the type to keep it short and sweet and just code, then you may want to use Pranavraja's framework as it is just the context (Desc) and specs writing. I forked his code and submitted a few bug tweaks at first. But along the way, I started to have grand visions of my soul mate [Machine.Specifications](https://github.com/machine/machine.specifications) (which is called MSpec for short) for BDD testing. The ease of defining complete stories right down to the scenarios without having to implement them intrigued me in C#. It freed me from worrying about implementation details and just focus on the feature I was writing: What did it need to do? What context was I given to start with? What should it do? So while using Pranavraja's Zen framework, I kept asking myself: Could I bring those MSpec practices to Go, using a bare-bones framework? Ok, done. And since it was so heavily inspired by Aaron's MSpec project, I kept the name going here: `GoMSpec`. While keeping backwards compatibility with his existing Zen framework, I defined several goals for this package: * Had to stay simple with Give/When/Then definitions. No complex coding. * Keep the low syntax noise from the existing Zen package. * I had to be able to write features, scenarios and specs with no implementation details needed. That last goal above is key and I think is what speaks truly about what BDD is: focus on the story, feature and/or context you are designing - focus on the Behavior! I tended to design my C# code using Machine.Specifications in this BDD-style by writing entire stories and grand specs up front - designing the system I was building, or the feature I was extending. In C# land, it's not unheard of me hitting 50 to 100 specs across a single feature and a few different contexts in an hour or two, before writing any code. Which at that point, I had everything planned out pretty much the way it should behave. So with this framework, I came up with a simple method name, `NA()`, to keep the syntax noise down. Therefore, you are free to code specs with just a little syntax noise:
Package bolt implements a low-level key/value store in pure Go. It supports fully serializable transactions, ACID semantics, and lock-free MVCC with multiple readers and a single writer. Bolt can be used for projects that want a simple data store without the need to add large dependencies such as Postgres or MySQL. Bolt is a single-level, zero-copy, B+tree data store. This means that Bolt is optimized for fast read access and does not require recovery in the event of a system crash. Transactions which have not finished committing will simply be rolled back in the event of a crash. The design of Bolt is based on Howard Chu's LMDB database project. Bolt currently works on Windows, Mac OS X, and Linux. There are only a few types in Bolt: DB, Bucket, Tx, and Cursor. The DB is a collection of buckets and is represented by a single file on disk. A bucket is a collection of unique keys that are associated with values. Transactions provide either read-only or read-write access to the database. Read-only transactions can retrieve key/value pairs and can use Cursors to iterate over the dataset sequentially. Read-write transactions can create and delete buckets and can insert and remove keys. Only one read-write transaction is allowed at a time. The database uses a read-only, memory-mapped data file to ensure that applications cannot corrupt the database, however, this means that keys and values returned from Bolt cannot be changed. Writing to a read-only byte slice will cause Go to panic. Keys and values retrieved from the database are only valid for the life of the transaction. When used outside the transaction, these byte slices can point to different data or can point to invalid memory which will cause a panic.
Package amqp is an AMQP 0.9.1 client with RabbitMQ extensions Understand the AMQP 0.9.1 messaging model by reviewing these links first. Much of the terminology in this library directly relates to AMQP concepts. Most other broker clients publish to queues, but in AMQP, clients publish Exchanges instead. AMQP is programmable, meaning that both the producers and consumers agree on the configuration of the broker, instead requiring an operator or system configuration that declares the logical topology in the broker. The routing between producers and consumer queues is via Bindings. These bindings form the logical topology of the broker. In this library, a message sent from publisher is called a "Publishing" and a message received to a consumer is called a "Delivery". The fields of Publishings and Deliveries are close but not exact mappings to the underlying wire format to maintain stronger types. Many other libraries will combine message properties with message headers. In this library, the message well known properties are strongly typed fields on the Publishings and Deliveries, whereas the user defined headers are in the Headers field. The method naming closely matches the protocol's method name with positional parameters mapping to named protocol message fields. The motivation here is to present a comprehensive view over all possible interactions with the server. Generally, methods that map to protocol methods of the "basic" class will be elided in this interface, and "select" methods of various channel mode selectors will be elided for example Channel.Confirm and Channel.Tx. The library is intentionally designed to be synchronous, where responses for each protocol message are required to be received in an RPC manner. Some methods have a noWait parameter like Channel.QueueDeclare, and some methods are asynchronous like Channel.Publish. The error values should still be checked for these methods as they will indicate IO failures like when the underlying connection closes. Clients of this library may be interested in receiving some of the protocol messages other than Deliveries like basic.ack methods while a channel is in confirm mode. The Notify* methods with Connection and Channel receivers model the pattern of asynchronous events like closes due to exceptions, or messages that are sent out of band from an RPC call like basic.ack or basic.flow. Any asynchronous events, including Deliveries and Publishings must always have a receiver until the corresponding chans are closed. Without asynchronous receivers, the sychronous methods will block. It's important as a client to an AMQP topology to ensure the state of the broker matches your expectations. For both publish and consume use cases, make sure you declare the queues, exchanges and bindings you expect to exist prior to calling Channel.Publish or Channel.Consume. SSL/TLS - Secure connections When Dial encounters an amqps:// scheme, it will use the zero value of a tls.Config. This will only perform server certificate and host verification. Use DialTLS when you wish to provide a client certificate (recommended), include a private certificate authority's certificate in the cert chain for server validity, or run insecure by not verifying the server certificate dial your own connection. DialTLS will use the provided tls.Config when it encounters an amqps:// scheme and will dial a plain connection when it encounters an amqp:// scheme. SSL/TLS in RabbitMQ is documented here: http://www.rabbitmq.com/ssl.html
Package log15 provides an opinionated, simple toolkit for best-practice logging that is both human and machine readable. It is modeled after the standard library's io and net/http packages. This package enforces you to only log key/value pairs. Keys must be strings. Values may be any type that you like. The default output format is logfmt, but you may also choose to use JSON instead if that suits you. Here's how you log: This will output a line that looks like: To get started, you'll want to import the library: Now you're ready to start logging: Because recording a human-meaningful message is common and good practice, the first argument to every logging method is the value to the *implicit* key 'msg'. Additionally, the level you choose for a message will be automatically added with the key 'lvl', and so will the current timestamp with key 't'. You may supply any additional context as a set of key/value pairs to the logging function. log15 allows you to favor terseness, ordering, and speed over safety. This is a reasonable tradeoff for logging functions. You don't need to explicitly state keys/values, log15 understands that they alternate in the variadic argument list: If you really do favor your type-safety, you may choose to pass a log.Ctx instead: Frequently, you want to add context to a logger so that you can track actions associated with it. An http request is a good example. You can easily create new loggers that have context that is automatically included with each log line: This will output a log line that includes the path context that is attached to the logger: The Handler interface defines where log lines are printed to and how they are formated. Handler is a single interface that is inspired by net/http's handler interface: Handlers can filter records, format them, or dispatch to multiple other Handlers. This package implements a number of Handlers for common logging patterns that are easily composed to create flexible, custom logging structures. Here's an example handler that prints logfmt output to Stdout: Here's an example handler that defers to two other handlers. One handler only prints records from the rpc package in logfmt to standard out. The other prints records at Error level or above in JSON formatted output to the file /var/log/service.json This package implements three Handlers that add debugging information to the context, CallerFileHandler, CallerFuncHandler and CallerStackHandler. Here's an example that adds the source file and line number of each logging call to the context. This will output a line that looks like: Here's an example that logs the call stack rather than just the call site. This will output a line that looks like: The "%+v" format instructs the handler to include the path of the source file relative to the compile time GOPATH. The github.com/go-stack/stack package documents the full list of formatting verbs and modifiers available. The Handler interface is so simple that it's also trivial to write your own. Let's create an example handler which tries to write to one handler, but if that fails it falls back to writing to another handler and includes the error that it encountered when trying to write to the primary. This might be useful when trying to log over a network socket, but if that fails you want to log those records to a file on disk. This pattern is so useful that a generic version that handles an arbitrary number of Handlers is included as part of this library called FailoverHandler. Sometimes, you want to log values that are extremely expensive to compute, but you don't want to pay the price of computing them if you haven't turned up your logging level to a high level of detail. This package provides a simple type to annotate a logging operation that you want to be evaluated lazily, just when it is about to be logged, so that it would not be evaluated if an upstream Handler filters it out. Just wrap any function which takes no arguments with the log.Lazy type. For example: If this message is not logged for any reason (like logging at the Error level), then factorRSAKey is never evaluated. The same log.Lazy mechanism can be used to attach context to a logger which you want to be evaluated when the message is logged, but not when the logger is created. For example, let's imagine a game where you have Player objects: You always want to log a player's name and whether they're alive or dead, so when you create the player object, you might do: Only now, even after a player has died, the logger will still report they are alive because the logging context is evaluated when the logger was created. By using the Lazy wrapper, we can defer the evaluation of whether the player is alive or not to each log message, so that the log records will reflect the player's current state no matter when the log message is written: If log15 detects that stdout is a terminal, it will configure the default handler for it (which is log.StdoutHandler) to use TerminalFormat. This format logs records nicely for your terminal, including color-coded output based on log level. Becasuse log15 allows you to step around the type system, there are a few ways you can specify invalid arguments to the logging functions. You could, for example, wrap something that is not a zero-argument function with log.Lazy or pass a context key that is not a string. Since logging libraries are typically the mechanism by which errors are reported, it would be onerous for the logging functions to return errors. Instead, log15 handles errors by making these guarantees to you: - Any log record containing an error will still be printed with the error explained to you as part of the log record. - Any log record containing an error will include the context key LOG15_ERROR, enabling you to easily (and if you like, automatically) detect if any of your logging calls are passing bad values. Understanding this, you might wonder why the Handler interface can return an error value in its Log method. Handlers are encouraged to return errors only if they fail to write their log records out to an external source like if the syslog daemon is not responding. This allows the construction of useful handlers which cope with those failures like the FailoverHandler. log15 is intended to be useful for library authors as a way to provide configurable logging to users of their library. Best practice for use in a library is to always disable all output for your logger by default and to provide a public Logger instance that consumers of your library can configure. Like so: Users of your library may then enable it if they like: The ability to attach context to a logger is a powerful one. Where should you do it and why? I favor embedding a Logger directly into any persistent object in my application and adding unique, tracing context keys to it. For instance, imagine I am writing a web browser: When a new tab is created, I assign a logger to it with the url of the tab as context so it can easily be traced through the logs. Now, whenever we perform any operation with the tab, we'll log with its embedded logger and it will include the tab title automatically: There's only one problem. What if the tab url changes? We could use log.Lazy to make sure the current url is always written, but that would mean that we couldn't trace a tab's full lifetime through our logs after the user navigate to a new URL. Instead, think about what values to attach to your loggers the same way you think about what to use as a key in a SQL database schema. If it's possible to use a natural key that is unique for the lifetime of the object, do so. But otherwise, log15's ext package has a handy RandId function to let you generate what you might call "surrogate keys" They're just random hex identifiers to use for tracing. Back to our Tab example, we would prefer to set up our Logger like so: Now we'll have a unique traceable identifier even across loading new urls, but we'll still be able to see the tab's current url in the log messages. For all Handler functions which can return an error, there is a version of that function which will return no error but panics on failure. They are all available on the Must object. For example: All of the following excellent projects inspired the design of this library: code.google.com/p/log4go github.com/op/go-logging github.com/technoweenie/grohl github.com/Sirupsen/logrus github.com/kr/logfmt github.com/spacemonkeygo/spacelog golang's stdlib, notably io and net/http https://xkcd.com/927/
Package tusd provides ways to accept tus 1.0 calls using HTTP. tus is a protocol based on HTTP for resumable file uploads. Resumable means that an upload can be interrupted at any moment and can be resumed without re-uploading the previous data again. An interruption may happen willingly, if the user wants to pause, or by accident in case of an network issue or server outage (http://tus.io). tusd was designed in way which allows an flexible and customizable usage. We wanted to avoid binding this package to a specific storage system – particularly a proprietary third-party software. Therefore tusd is an abstract layer whose only job is to accept incoming HTTP requests, validate them according to the specification and finally passes them to the data store. The data store is another important component in tusd's architecture whose purpose is to do the actual file handling. It has to write the incoming upload to a persistent storage system and retrieve information about an upload's current state. Therefore it is the only part of the system which communicates directly with the underlying storage system, whether it be the local disk, a remote FTP server or cloud providers such as AWS S3. The only hard requirements for a data store can be found in the DataStore interface. It contains methods for creating uploads (NewUpload), writing to them (WriteChunk) and retrieving their status (GetInfo). However, there are many more features which are not mandatory but may still be used. These are contained in their own interfaces which all share the *DataStore suffix. For example, GetReaderDataStore which enables downloading uploads or TerminaterDataStore which allows uploads to be terminated. The store composer offers a way to combine the basic data store - the core - implementation and these additional extensions: The corresponding methods for adding an extension to the composer are prefixed with Use* followed by the name of the corresponding interface. However, most data store provide multiple extensions and adding all of them manually can be tedious and error-prone. Therefore, all data store distributed with tusd provide an UseIn() method which does this job automatically. For example, this is the S3 store in action (see S3Store.UseIn): Finally, once you are done with composing your data store, you can pass it inside the Config struct in order to create create a new tusd HTTP handler: This handler can then be mounted to a specific path, e.g. /files:
Package spacelog is a collection of interface lego bricks designed to help you build a flexible logging system. spacelog is loosely inspired by the Python logging library. The basic interaction is between a Logger and a Handler. A Logger is what the programmer typically interacts with for creating log messages. A Logger will be at a given log level, and if log messages can clear that specific logger's log level filter, they will be passed off to the Handler. Loggers are instantiated from GetLogger and GetLoggerNamed. A Handler is a very generic interface for handling log events. You can provide your own Handler for doing structured JSON output or colorized output or countless other things. Provided are a simple TextHandler with a variety of log event templates and TextOutput sinks, such as io.Writer, Syslog, and so forth. Make sure to see the source of the setup subpackage for an example of easy and configurable logging setup at process start:
Package log15 provides an opinionated, simple toolkit for best-practice logging that is both human and machine readable. It is modeled after the standard library's io and net/http packages. This package enforces you to only log key/value pairs. Keys must be strings. Values may be any type that you like. The default output format is logfmt, but you may also choose to use JSON instead if that suits you. Here's how you log: This will output a line that looks like: To get started, you'll want to import the library: Now you're ready to start logging: Because recording a human-meaningful message is common and good practice, the first argument to every logging method is the value to the *implicit* key 'msg'. Additionally, the level you choose for a message will be automatically added with the key 'lvl', and so will the current timestamp with key 't'. You may supply any additional context as a set of key/value pairs to the logging function. log15 allows you to favor terseness, ordering, and speed over safety. This is a reasonable tradeoff for logging functions. You don't need to explicitly state keys/values, log15 understands that they alternate in the variadic argument list: If you really do favor your type-safety, you may choose to pass a log.Ctx instead: Frequently, you want to add context to a logger so that you can track actions associated with it. An http request is a good example. You can easily create new loggers that have context that is automatically included with each log line: This will output a log line that includes the path context that is attached to the logger: The Handler interface defines where log lines are printed to and how they are formated. Handler is a single interface that is inspired by net/http's handler interface: Handlers can filter records, format them, or dispatch to multiple other Handlers. This package implements a number of Handlers for common logging patterns that are easily composed to create flexible, custom logging structures. Here's an example handler that prints logfmt output to Stdout: Here's an example handler that defers to two other handlers. One handler only prints records from the rpc package in logfmt to standard out. The other prints records at Error level or above in JSON formatted output to the file /var/log/service.json This package implements three Handlers that add debugging information to the context, CallerFileHandler, CallerFuncHandler and CallerStackHandler. Here's an example that adds the source file and line number of each logging call to the context. This will output a line that looks like: Here's an example that logs the call stack rather than just the call site. This will output a line that looks like: The "%+v" format instructs the handler to include the path of the source file relative to the compile time GOPATH. The github.com/go-stack/stack package documents the full list of formatting verbs and modifiers available. The Handler interface is so simple that it's also trivial to write your own. Let's create an example handler which tries to write to one handler, but if that fails it falls back to writing to another handler and includes the error that it encountered when trying to write to the primary. This might be useful when trying to log over a network socket, but if that fails you want to log those records to a file on disk. This pattern is so useful that a generic version that handles an arbitrary number of Handlers is included as part of this library called FailoverHandler. Sometimes, you want to log values that are extremely expensive to compute, but you don't want to pay the price of computing them if you haven't turned up your logging level to a high level of detail. This package provides a simple type to annotate a logging operation that you want to be evaluated lazily, just when it is about to be logged, so that it would not be evaluated if an upstream Handler filters it out. Just wrap any function which takes no arguments with the log.Lazy type. For example: If this message is not logged for any reason (like logging at the Error level), then factorRSAKey is never evaluated. The same log.Lazy mechanism can be used to attach context to a logger which you want to be evaluated when the message is logged, but not when the logger is created. For example, let's imagine a game where you have Player objects: You always want to log a player's name and whether they're alive or dead, so when you create the player object, you might do: Only now, even after a player has died, the logger will still report they are alive because the logging context is evaluated when the logger was created. By using the Lazy wrapper, we can defer the evaluation of whether the player is alive or not to each log message, so that the log records will reflect the player's current state no matter when the log message is written: If log15 detects that stdout is a terminal, it will configure the default handler for it (which is log.StdoutHandler) to use TerminalFormat. This format logs records nicely for your terminal, including color-coded output based on log level. Becasuse log15 allows you to step around the type system, there are a few ways you can specify invalid arguments to the logging functions. You could, for example, wrap something that is not a zero-argument function with log.Lazy or pass a context key that is not a string. Since logging libraries are typically the mechanism by which errors are reported, it would be onerous for the logging functions to return errors. Instead, log15 handles errors by making these guarantees to you: - Any log record containing an error will still be printed with the error explained to you as part of the log record. - Any log record containing an error will include the context key LOG15_ERROR, enabling you to easily (and if you like, automatically) detect if any of your logging calls are passing bad values. Understanding this, you might wonder why the Handler interface can return an error value in its Log method. Handlers are encouraged to return errors only if they fail to write their log records out to an external source like if the syslog daemon is not responding. This allows the construction of useful handlers which cope with those failures like the FailoverHandler. log15 is intended to be useful for library authors as a way to provide configurable logging to users of their library. Best practice for use in a library is to always disable all output for your logger by default and to provide a public Logger instance that consumers of your library can configure. Like so: Users of your library may then enable it if they like: The ability to attach context to a logger is a powerful one. Where should you do it and why? I favor embedding a Logger directly into any persistent object in my application and adding unique, tracing context keys to it. For instance, imagine I am writing a web browser: When a new tab is created, I assign a logger to it with the url of the tab as context so it can easily be traced through the logs. Now, whenever we perform any operation with the tab, we'll log with its embedded logger and it will include the tab title automatically: There's only one problem. What if the tab url changes? We could use log.Lazy to make sure the current url is always written, but that would mean that we couldn't trace a tab's full lifetime through our logs after the user navigate to a new URL. Instead, think about what values to attach to your loggers the same way you think about what to use as a key in a SQL database schema. If it's possible to use a natural key that is unique for the lifetime of the object, do so. But otherwise, log15's ext package has a handy RandId function to let you generate what you might call "surrogate keys" They're just random hex identifiers to use for tracing. Back to our Tab example, we would prefer to set up our Logger like so: Now we'll have a unique traceable identifier even across loading new urls, but we'll still be able to see the tab's current url in the log messages. For all Handler functions which can return an error, there is a version of that function which will return no error but panics on failure. They are all available on the Must object. For example: All of the following excellent projects inspired the design of this library: code.google.com/p/log4go github.com/op/go-logging github.com/technoweenie/grohl github.com/Sirupsen/logrus github.com/kr/logfmt github.com/spacemonkeygo/spacelog golang's stdlib, notably io and net/http https://xkcd.com/927/
Package ratchet is a library for performing data pipeline / ETL tasks in Go. The main construct in Ratchet is Pipeline. A Pipeline has a series of PipelineStages, which will each perform some type of data processing, and then send new data on to the next stage. Each PipelineStage consists of one or more DataProcessors, which are responsible for receiving, processing, and then sending data on to the next stage of processing. DataProcessors each run in their own goroutine, and therefore all data processing can be executing concurrently. Here is a conceptual drawing of a fairly simple Pipeline: In this example, we have a Pipeline consisting of 3 PipelineStages. The first stage has a DataProcessor that runs queries on a SQL database, the second is doing custom transformation work on that data, and the third stage branches into 2 DataProcessors, one writing the resulting data to a CSV file, and the other inserting into another SQL database. In the example above, Stage 1 and Stage 3 are using built-in DataProcessors (see the "processors" package/subdirectory). However, Stage 2 is using a custom implementation of DataProcessor. By using a combination of built-in processors, and supporting the writing of any Go code to process data, Ratchet makes it possible to write very custom and fast data pipeline systems. See the DataProcessor documentation to learn more. Since each DataProcessor is running in it's own goroutine, SQLReader can continue pulling and sending data while each subsequent stage is also processing data. Optimally-designed pipelines have processors that can each run in an isolated fashion, processing data without having to worry about what's coming next down the pipeline. All data payloads sent between DataProcessors are of type data.JSON ([]byte). This provides a good balance of consistency and flexibility. See the "data" package for details and helper functions for dealing with data.JSON. Another good read for handling JSON data in Go is http://blog.golang.org/json-and-go. Note that many of the concepts in Ratchet were taken from the Golang blog's post on pipelines (http://blog.golang.org/pipelines). While the details discussed in that blog post are largely abstracted away by Ratchet, it is still an interesting read and will help explain the general concepts being applied. There are two ways to construct and run a Pipeline. The first is a basic, non-branching Pipeline. For example: This is a 3-stage Pipeline that queries some SQL data in stage 1, does some custom data transformation in stage 2, and then writes the resulting data to a SQL table in stage 3. The code to create and run this basic Pipeline would look something like: The second way to construct a Pipeline is using a PipelineLayout. This method allows for more complex Pipeline configurations that support branching between stages that are running multiple DataProcessors. Here is a (fairly complex) example: This Pipeline consists of 4 stages where each DataProcessor is choosing which DataProcessors in the subsequent stage should receive the data it sends. The SQLReader in stage 2, for example, is sending data to only 2 processors in the next stage, while the Custom DataProcessor in stage 2 is sending it's data to 3. The code for constructing and running a Pipeline like this would look like: This example is only conceptual, the main points being to explain the flexibility you have when designing your Pipeline's layout and to demonstrate the syntax for constructing a new PipelineLayout.
Package elmobd provides communication with cars OBD-II system using ELM327 based USB-devices. Using this library and a ELM327-based USB-device you can communicate with your cars on-board diagnostics system to read sensor data. Reading trouble codes and resetting them is not yet implemented. All assumptions this library makes are based on the official Elm Electronics datasheet of the ELM327 IC: https://www.elmelectronics.com/wp-content/uploads/2017/01/ELM327DS.pdf After that introduction - Welcome! I hope you'll find this library useful and its documentation easy to digest. If that's not the case, please create a GitHub-issue: https://github.com/rzetterberg/elmobd/issues You'll note that this package has the majority its types and functions exported. The reason for that is that there are many different commands you can send to a ELM327 device, and this library will never we able to define all commands, so most functionality is exported so that you can easily extend it. You'll also note that there are A LOT of types. The reason for this is each command that can be sent to the ELM327 device is defined as it's own type. The approach I wanted to take with this library was to get as much type safety as possible. With that said, as an end user of this library you'll only need to know about two kinds types: the Device type and types that implement the OBDCommand interface. The Device type represents an active connection with an ELM327 device. You plug in your device into your computer, get the path to the device and initialize a new device: This library is design to be as high level as possible, you shouldn't need to handle baud rates, setting protocols or anything like that. After the Device has been initialized you can just start sending commands. The function you will be using the most is the Device.RunOBDCommand, which accepts a command, sends the command to the device, waits for a response, parses the response and gives it back to you. Which means this command is blocking execution until it has been finished. The default timeout is currently set to 5 seconds. The RunOBDCommand accepts a single argument, a type that implements the OBDCommand interface. That interface has a couple of functions that needs to exist in order to be able to generate the low-level request for the device and parse the low-level response. Basically you send in an empty OBDCommand and get back a OBDCommand with the processed value retrieved from the device. Suppose that after checking the device connection (in our example above) we would like to retrieve the vehicles speed. There's a type called VehicleSpeed that implements the OBDCommand interface that we can use. We start by creating a new VehicleSpeed using its constructor NewVehicleSpeed that we then give to RunOBDCommand: At the moment there are around 15 sensor commands defined in this library. They are all defined in the file commands.go of the library (https://github.com/rzetterberg/elmobd/blob/master/commands.go). If you look in the end of that file you'll find an internal static variable with all the sensor commands called sensorCommands. That's the basics of this library - you create a device connection and then run commands. Documentation of more advanced use-cases is in the works, so don't worry! If there's something that you wish would be documented, or something that is not clear in the current documentation, please create a GitHub-issue.
Package libstorage provides a vendor agnostic storage orchestration model, API, and reference client and server implementations. libStorage enables storage consumption by leveraging methods commonly available, locally and/or externally, to an operating system (OS). The libStorage project and its architecture represents a culmination of experience gained from the project authors' building of several (http://bit.ly/1HIAet6) different storage (http://bit.ly/1Ya9Uft) orchestration tools (https://github.com/codedellemc/rexray). While created using different languages and targeting disparate storage platforms, all the tools were architecturally aligned and embedded functionality directly inside the tools and affected storage platforms. This shared design goal enabled tools that natively consumed storage, sans external dependencies. Today libStorage focuses on adding value to container runtimes and storage orchestration tools such as Docker and Mesos, however the libStorage framework is available abstractly for more general usage across: The client side implementation, focused on operating system activities, has a minimal set of dependencies in order to avoid a large, runtime footprint.