Package goworker is a Resque-compatible, Go-based background worker. It allows you to push jobs into a queue using an expressive language like Ruby while harnessing the efficiency and concurrency of Go to minimize job latency and cost. goworker workers can run alongside Ruby Resque clients so that you can keep all but your most resource-intensive jobs in Ruby. To create a worker, write a function matching the signature and register it using Here is a simple worker that prints its arguments: To create workers that share a database pool or other resources, use a closure to share variables. goworker worker functions receive the queue they are serving and a slice of interfaces. To use them as parameters to other functions, use Go type assertions to convert them into usable types. For testing, it is helpful to use the redis-cli program to insert jobs onto the Redis queue: will insert 100 jobs for the MyClass worker onto the myqueue queue. It is equivalent to: After building your workers, you will have an executable that you can run which will automatically poll a Redis server and call your workers as jobs arrive. There are several flags which control the operation of the goworker client. -queues="comma,delimited,queues" — This is the only required flag. The recommended practice is to separate your Resque workers from your goworkers with different queues. Otherwise, Resque worker classes that have no goworker analog will cause the goworker process to fail the jobs. Because of this, there is no default queue, nor is there a way to select all queues (à la Resque's * queue). Queues are processed in the order they are specififed. If you have multiple queues you can assign them weights. A queue with a weight of 2 will be checked twice as often as a queue with a weight of 1: -queues='high=2,low=1'. -interval=5.0 — Specifies the wait period between polling if no job was in the queue the last time one was requested. -concurrency=25 — Specifies the number of concurrently executing workers. This number can be as low as 1 or rather comfortably as high as 100,000, and should be tuned to your workflow and the availability of outside resources. -connections=2 — Specifies the maximum number of Redis connections that goworker will consume between the poller and all workers. There is not much performance gain over two and a slight penalty when using only one. This is configurable in case you need to keep connection counts low for cloud Redis providers who limit plans on maxclients. -uri=redis://localhost:6379/ — Specifies the URI of the Redis database from which goworker polls for jobs. Accepts URIs of the format redis://user:pass@host:port/db or unix:///path/to/redis.sock. The flag may also be set by the environment variable $($REDIS_PROVIDER) or $REDIS_URL. E.g. set $REDIS_PROVIDER to REDISTOGO_URL on Heroku to let the Redis To Go add-on configure the Redis database. -namespace=resque: — Specifies the namespace from which goworker retrieves jobs and stores stats on workers. -exit-on-complete=false — Exits goworker when there are no jobs left in the queue. This is helpful in conjunction with the time command to benchmark different configurations. -use-number=false — Uses json.Number when decoding numbers in the job payloads. This will avoid issues that occur when goworker and the json package decode large numbers as floats, which then get encoded in scientific notation, losing pecision. This will default to true soon. You can also configure your own flags for use within your workers. Be sure to set them before calling goworker.Main(). It is okay to call flags.Parse() before calling goworker.Main() if you need to do additional processing on your flags. To stop goworker, send a QUIT, TERM, or INT signal to the process. This will immediately stop job polling. There can be up to $CONCURRENCY jobs currently running, which will continue to run until they are finished. Like Resque, goworker makes no guarantees about the safety of jobs in the event of process shutdown. Workers must be both idempotent and tolerant to loss of the job in the event of failure. If the process is killed with a KILL or by a system failure, there may be one job that is currently in the poller's buffer that will be lost without any representation in either the queue or the worker variable. If you are running Goworker on a system like Heroku, which sends a TERM to signal a process that it needs to stop, ten seconds later sends a KILL to force the process to stop, your jobs must finish within 10 seconds or they may be lost. Jobs will be recoverable from the Redis database under as a JSON object with keys queue, run_at, and payload, but the process is manual. Additionally, there is no guarantee that the job in Redis under the worker key has not finished, if the process is killed before goworker can flush the update to Redis.
Package que-go is a fully interoperable Golang port of Chris Hanks' Ruby Que queueing library for PostgreSQL. Que uses PostgreSQL's advisory locks for speed and reliability. See the original Que documentation for more details: https://github.com/chanks/que Because que-go is an interoperable port of Que, you can enqueue jobs in Ruby (i.e. from a Rails app) and write your workers in Go. Or if you have a limited set of jobs that you want to write in Go, you can leave most of your workers in Ruby and just add a few Go workers on a different queue name. Instead of using database/sql and the more popular pq PostgreSQL driver, this package uses the pgx driver: https://github.com/jackc/pgx Because Que uses session-level advisory locks, we have to hold the same connection throughout the process of getting a job, working it, deleting it, and removing the lock. Pq and the built-in database/sql interfaces do not offer this functionality, so we'd have to implement our own connection pool. Fortunately, pgx already has a perfectly usable one built for us. Even better, it offers better performance than pq due largely to its use of binary encoding. que-go relies on prepared statements for performance. As of now these have to be initialized manually on your connection pool like so: If you have suggestions on how to cleanly do this automatically, please open an issue! Here is a complete example showing worker setup and two jobs enqueued, one with a delay:
Package amboy provides basic infrastructure for running and describing tasks and task workflows with, potentially, minimal overhead and additional complexity. Amboy works with 4 basic logical objects: jobs, or descriptions of tasks; runnners, which are responsible for executing tasks; queues, that represent pipelines and offline workflows of tasks (e.g. not real time, processes that run outside of the primary execution path of a program); and dependencies that represent relationships between jobs. The inspiration for amboy was to be able to provide a unified way to define and run jobs, that would feel equally "native" for distributed applications and distributed web application, and move easily between different architectures. While amboy users will generally implement their own Job and dependency implementations, Amboy itself provides several example Queue implementations, as well as several generic examples and prototypes of Job and dependency.Manager objects. Generally speaking you should be able to use included amboy components to provide the queue and runner components, in conjunction with custom and generic job and dependency variations. Consider the following example: The amboy package proves a number of generic methods that, using the Queue.Stats() method, block until all jobs are complete. They provide different semantics, which may be useful in different circumstances. All of these functions wait until the total number of jobs submitted to the queue is equal to the number of completed jobs, and as a result these methods don't prevent other threads from adding jobs to the queue after beginning to wait. Additionally, there are a set of methods that allow callers to wait for a specific job to complete.
Package amboy provides basic infrastructure for running and describing jobs and job workflows with, potentially, minimal overhead and additional complexity. Amboy works with 4 basic logical objects: jobs representing work; runners, which are responsible for executing jobs; queues, that represent pipelines and offline workflows of jobs (i.e. not real time, processes that run outside of the primary execution path of a program); and dependencies that represent relationships between jobs. The inspiration for amboy was to be able to provide a unified way to define and run jobs, that would feel equally "native" for distributed applications and distributed web application, and move easily between different architectures. While amboy users will generally implement their own Job and dependency implementations, Amboy itself provides several example Queue implementations, as well as several generic examples and prototypes of Job and dependency.Manager objects. Generally speaking you should be able to use included amboy components to provide the queue and runner components, in conjunction with custom and generic job and dependency variations. Consider the following example: The amboy package proves a number of generic methods that, using the Queue.Stats() method, block until all jobs are complete. They provide different semantics, which may be useful in different circumstances. All of the Wait* functions wait until the total number of jobs submitted to the queue is equal to the number of completed jobs, and as a result these methods don't prevent other threads from adding jobs to the queue after beginning to wait. As a special case, retryable queues will also wait until there are no retrying jobs remaining. Additionally, there are a set of methods, WaitJob*, that allow callers to wait for a specific job to complete.
Package gue implements Golang queues on top of PostgreSQL. It uses transaction-level locks for concurrent work. Package supports several PostgreSQL drivers using adapter interface internally. Currently, adapters for the following drivers have been implemented: Here is a complete example showing worker setup for pgx/v4 and two jobs enqueued, one with a delay:
Package gue implements Golang queue on top of PostgreSQL. It uses transaction-level locks for concurrent work. Package supports several PostgreSQL drivers using adapter interface internally. Currently, adapters for the following drivers have been implemented: Here is a complete example showing worker setup for pgx/v4 and two jobs enqueued, one with a delay:
Package anser provides a document transformation and processing tool to support data migrations. The anser.Application is the primary interface in which migrations are defined and executed. Applications are constructed with a list of MigrationGenerators, and relevant operations. Then the Setup method configures the application, with an anser.Environment, which sets up and collects dependency information. Finally, the Run method executes the migrations in two phases: first by generating migration jobs, and finally by running all migration jobs. The ordering of migrations is derived from the dependency information between generators and the jobs that they generate. When possible jobs are executed in parallel, but the execution of migration operations is a property of the queue object configured in the anser.Environment. The anser package provides a custom amboy/dependency.Manager object, which allows migrations to express dependencies to other migrations. The State() method ensures that all migration IDs specified as edges are satisfied before reporting as "ready" for work. Anser provides the Environment interface, with a global instance accessible via the exported GetEnvironment() function to provide access to runtime configuration state: database connections; amboy.Queue objects, and registries for task implementations. The Environment is an interface: you can build a mock, or use one provided for testing purposes by anser (coming soon). Generators create migration operations and are the first step in an anser Migration. They are supersets of amboy.Job interfaces. The current limitation is that the generated jobs must be stored within the implementation of the generator job, which means they must either all fit in memory *or* be serializable independently (e.g. fit in the 16mb document limit if using a MongoDB backed queue.)
Package worker accepts Jobs and places them in a queue to be executed N at a time.
Package goworker is a Resque-compatible, Go-based background worker. It allows you to push jobs into a queue using an expressive language like Ruby while harnessing the efficiency and concurrency of Go to minimize job latency and cost. goworker workers can run alongside Ruby Resque clients so that you can keep all but your most resource-intensive jobs in Ruby. To create a worker, write a function matching the signature and register it using Here is a simple worker that prints its arguments: To create workers that share a database pool or other resources, use a closure to share variables. goworker worker functions receive the queue they are serving and a slice of interfaces. To use them as parameters to other functions, use Go type assertions to convert them into usable types. For testing, it is helpful to use the redis-cli program to insert jobs onto the Redis queue: will insert 100 jobs for the MyClass worker onto the myqueue queue. It is equivalent to: After building your workers, you will have an executable that you can run which will automatically poll a Redis server and call your workers as jobs arrive. There are several flags which control the operation of the goworker client. -queues="comma,delimited,queues" — This is the only required flag. The recommended practice is to separate your Resque workers from your goworkers with different queues. Otherwise, Resque worker classes that have no goworker analog will cause the goworker process to fail the jobs. Because of this, there is no default queue, nor is there a way to select all queues (à la Resque's * queue). Queues are processed in the order they are specififed. If you have multiple queues you can assign them weights. A queue with a weight of 2 will be checked twice as often as a queue with a weight of 1: -queues='high=2,low=1'. -interval=5.0 — Specifies the wait period between polling if no job was in the queue the last time one was requested. -concurrency=25 — Specifies the number of concurrently executing workers. This number can be as low as 1 or rather comfortably as high as 100,000, and should be tuned to your workflow and the availability of outside resources. -connections=2 — Specifies the maximum number of Redis connections that goworker will consume between the poller and all workers. There is not much performance gain over two and a slight penalty when using only one. This is configurable in case you need to keep connection counts low for cloud Redis providers who limit plans on maxclients. -uri=redis://localhost:6379/ — Specifies the URI of the Redis database from which goworker polls for jobs. Accepts URIs of the format redis://user:pass@host:port/db or unix:///path/to/redis.sock. The flag may also be set by the environment variable $($REDIS_PROVIDER) or $REDIS_URL. E.g. set $REDIS_PROVIDER to REDISTOGO_URL on Heroku to let the Redis To Go add-on configure the Redis database. -namespace=resque: — Specifies the namespace from which goworker retrieves jobs and stores stats on workers. -exit-on-complete=false — Exits goworker when there are no jobs left in the queue. This is helpful in conjunction with the time command to benchmark different configurations. -use-number=false — Uses json.Number when decoding numbers in the job payloads. This will avoid issues that occur when goworker and the json package decode large numbers as floats, which then get encoded in scientific notation, losing pecision. This will default to true soon. You can also configure your own flags for use within your workers. Be sure to set them before calling goworker.Main(). It is okay to call flags.Parse() before calling goworker.Main() if you need to do additional processing on your flags. To stop goworker, send a QUIT, TERM, or INT signal to the process. This will immediately stop job polling. There can be up to $CONCURRENCY jobs currently running, which will continue to run until they are finished. Like Resque, goworker makes no guarantees about the safety of jobs in the event of process shutdown. Workers must be both idempotent and tolerant to loss of the job in the event of failure. If the process is killed with a KILL or by a system failure, there may be one job that is currently in the poller's buffer that will be lost without any representation in either the queue or the worker variable. If you are running Goworker on a system like Heroku, which sends a TERM to signal a process that it needs to stop, ten seconds later sends a KILL to force the process to stop, your jobs must finish within 10 seconds or they may be lost. Jobs will be recoverable from the Redis database under as a JSON object with keys queue, run_at, and payload, but the process is manual. Additionally, there is no guarantee that the job in Redis under the worker key has not finished, if the process is killed before goworker can flush the update to Redis.
Package gue implements Golang queues on top of PostgreSQL. It uses transaction-level locks for concurrent work. Package supports several PostgreSQL drivers using adapter interface internally. Currently, adapters for the following drivers have been implemented: Here is a complete example showing worker setup for pgx/v4 and two jobs enqueued, one with a delay:
Package amboy provides basic infrastructure for running and describing tasks and task workflows with, potentially, minimal overhead and additional complexity. Amboy works with 4 basic logical objects: jobs, or descriptions of tasks; runnners, which are responsible for executing tasks; queues, that represent pipelines and offline workflows of tasks (e.g. not real time, processes that run outside of the primary execution path of a program); and dependencies that represent relationships between jobs. The inspiration for amboy was to be able to provide a unified way to define and run jobs, that would feel equally "native" for distributed applications and distributed web application, and move easily between different architectures. While amboy users will generally implement their own Job and dependency implementations, Amboy itself provides several example Queue implementations, as well as several generic examples and prototypes of Job and dependency.Manager objects. Generally speaking you should be able to use included amboy components to provide the queue and runner components, in conjunction with custom and generic job and dependency variations. Consider the following example: The amboy package proves a number of generic methods that, using the Queue.Stats() method, block until all jobs are complete. They provide different semantics, which may be useful in different circumstances. All of these functions wait until the total number of jobs submitted to the queue is equal to the number of completed jobs, and as a result these methods don't prevent other threads from adding jobs to the queue after beginning to wait. Additionally, there are a set of methods that allow callers to wait for a specific job to complete.
Package main is a stub for wr's command line interface, with the actual implementation in the cmd package. wr is a workflow runner. You use it to run the commands in your workflow easily, automatically, reliably, with repeatability, and while making optimal use of your available computing resources. wr is implemented as a polling-free in-memory job queue with an on-disk acid transactional embedded database, written in go. Its main benefits over other software workflow management systems are its very low latency and overhead, its high performance at scale, its real-time status updates with a view on all your workflows on one screen, its permanent searchable history of all the commands you have ever run, and its "live" dependencies enabling easy automation of on-going projects. Start up the manager daemon, which gives you a url you can view the web interface on: In addition to the "local" scheduler, which will run your commands on all available cores of the local machine, you can also have it run your commands on your LSF cluster or in your OpenStack environment (where it will scale the number of servers needed up and down automatically). Now, stick the commands you want to run in a text file and: Arbitrarily complex workflows can be formed by specifying command dependencies. Use the --help option of `wr add` for details. wr's core is implemented in the queue package. This is the in-memory job queue that holds commands that still need to be run. Its multiple sub-queues enable certain guarantees: a given command will only get run by a single client at any one time; if a client dies, the command will get run by another client instead; if a command cannot be run, it is buried until the user takes action; if a command has a dependency, it won't run until its dependencies are complete. The jobqueue package provides client+server code for interacting with the in-memory queue from the queue package, and by storing all new commands in an on-disk database, provides an additional guarantee: that (dynamic) workflows won't break because a job that was added got "lost" before it got run. It also retains all completed jobs, enabling searching through of past workflows and allowing for "live" dependencies, triggering the rerunning of previously completed commands if their dependencies change. The jobqueue package is also what actually does the main "work" of the system: the server component knows how many commands need to be run and what their resource requirements (memory, time, cpus etc.) are, and submits the appropriate number of jobqueue runner clients to the job scheduler. The jobqueue/scheduler package has the scheduler-specific code that ensures that these runner clients get run on the configured system in the most efficient way possible. Eg. for LSF, if we have 10 commands that need 2GB of memory to run, we will submit a job array of size 10 with 2GB of memory reservation to LSF. The most limited (and therefore potentially least contended) queue capable of running the commands will be chosen. For OpenStack, the cheapest server (in terms of cores and memory) that can run the commands will be spawned, and once there is no more work to do on those servers, they get terminated to free up resources. The cloud package implements methods for interacting with cloud environments such as OpenStack. The corresponding jobqueue/scheduler package uses these methods to do their work. The static subdirectory contains the html, css and javascript needed for the web interface. See jobqueue/serverWebI.go for how the web interface backend is implemented. The internal package contains general utility functions, and most notably config.go holds the code for how the command line interface deals with config options.
Package pgq provides an implementation of a Postgres-backed job queue. Safe concurrency is built on top of the SKIP LOCKED functionality introduced in Postgres 9.5. Retries and exponential backoff are supported.
Package workerpool provides a workerpool. It also can expand and shrink dynamically. Jobs can be queued using the Queue() method which also accepts a timeout parameter for timing out queuing and if all workers are too busy. For expanding the queue, Expand() method can be used, which increases the number of workers. If a timeout is provided, these extra workers will stop, if there are not enough jobs to do. It is also possible to explicitly stop extra workers by providing a quit channel.
Package workerpool provides a workerpool. It also can expand and shrink dynamically. Jobs can be queued using the Queue() method which also accepts a timeout parameter for timing out queuing and if all workers are too busy. For expanding the queue, Expand() method can be used, which increases the number of workers. If a timeout is provided, these extra workers will stop, if there are not enough jobs to do. It is also possible to explicitly stop extra workers by providing a quit channel.
Package workerpool provides a workerpool. It also can expand and shrink dynamically. Jobs can be queued using the Queue() method which also accepts a timeout parameter for timing out queuing and if all workers are too busy. For expanding the queue, Expand() method can be used, which increases the number of workers. If a timeout is provided, these extra workers will stop, if there are not enough jobs to do. It is also possible to explicitly stop extra workers by providing a quit channel.
Package rickover contains logic for a scheduler and job queue backed by Postgres.
Package gue implements Golang queues on top of PostgreSQL. It uses transaction-level locks for concurrent work. Package supports several PostgreSQL drivers using adapter interface internally. Currently, adapters for the following drivers have been implemented: Here is a complete example showing worker setup for pgx/v4 and two jobs enqueued, one with a delay:
Package ed implements an event dispatcher that is meant to allow codebases to organize concerns around events. Instead of placing all concerns about a particular business logic event in one function, this package allows you to declare events by type and dispatach those events to other code so that other concerns may be maintained in separate areas of the codebase. This module is tagged as v0, thus complies with Go's definition and rules about v0 modules (https://go.dev/doc/modules/version-numbers#v0-number). In short, it means that the API of this module may change without incrementing the major version number. Each releasable version will simply increment the patch number. Given the surface area of this module is quite small, this should not be a huge issue if used in production code. ed uses Go's own type system to dispatch events to the correct event handlers. By simply registering an event handler that accepts an event of a particular type: That handler will be triggered when an event of the same type is dispatched from somewhere else in the program: Because it is Go's type system, event handlers can also be registered against interfaces instead of just concrete types. Thus, creating an event handler that will be triggered for all events is as simple as: Overall, this package's goal is to mostly be an aid in allowing application business logic to remain clear of ancillary conerns/activities that might otherwise pile up in certain code areas. For example, as a SaaS company grows, a lot of cross-cutting concerns can build up around a customer signing up for a service. In a monolithic codebase/service, the code that handles a new customer signup would become full of "do this check", "send this data to marketing systems", "screen this signup for abusive behavior" logic. This is where ed is meant to be: taking all of those concerns/side-effects that are ancillary to the core act of signing up a user, and placing them in there own code areas and allowing them to be communicated with via events. This package is not trying to be a job queue, or a messaging system. The dispatching of events is done synchronously and the events are designed to allow errors to be returned so that use-cases like synchronous validation are supported. That being said, a common use-case might be to use this package to allow the publishing of events/messages to a Kafka/SQS-like system, an event handler: Then in your main business logic for user signups:
Package wsqueue provides a framework over gorilla/mux and gorilla/websocket to operate kind of AMQP but over websockets. It offers a Server, a Client and two king of messaging protocols : Topics and Queues. Topics : Publish & Subscribe Pattern Publish and subscribe semantics are implemented by Topics. When you publish a message it goes to all the subscribers who are interested - so zero to many subscribers will receive a copy of the message. Only subscribers who had an active subscription at the time the broker receives the message will get a copy of the message. Start a server and handle a topic Start a client and listen on a topic Queues : Work queues Pattern The main idea behind Work Queues (aka: Task Queues) is to avoid doing a resource-intensive task immediately and having to wait for it to complete. Instead we schedule the task to be done later. We encapsulate a task as a message and send it to the queue. A worker process running in the background will pop the tasks and eventually execute the job. When you run many workers the tasks will be shared between them. Queues implement load balancer semantics. A single message will be received by exactly one consumer. If there are no consumers available at the time the message is sent it will be kept until a consumer is available that can process the message. If a consumer receives a message and does not acknowledge it before closing then the message will be redelivered to another consumer. A queue can have many consumers with messages load balanced across the available consumers. see samples/queue/main.go, samples/topic/main.go
The bookpipeline package contains various tools and functions for the OCR of books, with a focus on distributed OCR using short-lived virtual servers. It also contains several tools that are useful standalone; read the accompanying README for more details. The book pipeline is a way to split the different processes that for book OCR into small jobs, which can be processed when a computer is ready for them. It is currently implemented with Amazon's AWS cloud systems, and can scale from zero to many computers, with jobs being processed faster when more servers are available. Central to the bookpipeline in terms of software is the bookpipeline command, which is part of the rescribe.xyz/bookpipeline package. Presuming you have the go tools installed, you can install it, and useful tools to control the system, with this command: All of the tools provided in the bookpipeline package will give information on what they do and how they work with the '-h' flag, so for example to get usage information on the booktopipeline tool simply run the following: To get the pipeline tools to work for you, you'll need to change the settings in cloudsettings.go, and set up your ~/.aws/credentials appropriately. Most of the time the bookpipeline is expected to be run from potentially short-lived servers on Amazon's EC2 system. EC2 provides servers which have no guaranteed of stability (though in practice they seem to be), called "Spot Instances", which we use for bookpipeline. bookpipeline can handle a process or server being suddenly destroyed without warning (more on this later), so Spot Instances are perfect for us. We have set up a machine image with bookpipeline preinstalled which will launch at bootup, which is all that's needed to launch an bookpipeline instance. Presuming the bookpipeline package has been installed on your computer (see above), the spot instance can be started with the command: You can keep an eye on the servers (spot or otherwise) that are running, and the jobs left to do and in progress, with the "lspipeline" tool (which is also part of the bookpipeline package). It's recommended to use this with the ssh private key for the servers, so that it can also report on what each server is currently doing, but it can run successfully without it. It takes a little while to run, so be patient. It can be run with the command: Spot instances can be terminated with ssh, using their ip address which can be found with lspipeline, like so: The bookpipeline program is run as a service managed by systemd on the servers. The system is fully resiliant in the face of unexpected failures. See the section "How the pipeline works" for details on this. bookpipeline can be managed like any other systemd service. A few examples: Books can be added to the pipeline using the "booktopipeline" tool. This takes a directory of page images as input, and uploads them all to S3, adding a job to the pipeline queue to start processing them. So it can be used like this: Once a book has been finished, it can be downloaded using the "getpipelinebook" tool. This has several options to download specific parts of a book, but the default case will download the best hOCR for each page, PDFs, and the best, conf and graph.png files. Use it like this: To get the plain text from the book, use the hocrtotxt tool, which is part of the rescribe.xyz/utils package. You can get the package, and run the tool, like this: The central part of the book pipeline is several SQS queues, which contain jobs which need to be done by a server running bookpipeline. The exact content of the SQS messages vary according to each queue, as some jobs need more information than others. Each queue is checked at least once every couple of minutes on any server that isn't currently processing a job. When a job is taken from the queue by a process, it is hidden from the queue for 2 minutes so that no other process can take it. Once per minute when processing a job the process sends a message updating the queue, to tell it to keep the job hidden for two minutes. This is called the "heartbeat", as if the process fails for any reason the heartbeat will stop, and in 2 minutes the job will reappear on the queue for another process to have a go at. Once a job is completed successfully it is deleted from the queue. Queue names are defined in cloudsettings.go. queuePreProc Each message in the queuePreProc queue is a bookname, optionally followed by a space and the name of the training to use. Each page of the bookname will be binarised with several different parameters, and then wiped, with each version uploaded to S3, with the path of the preprocessed page, plus the training name if it was provided, will be added to the queueOcrPage queue. The pages are binarised with different parameters as it can be difficult to determine which binarisation level will be best prior to OCR, so several different options are used, and in the queueAnalyse step the best one is chosen, based on the confidence of the OCR output. queueWipeOnly This queue works the same as queuePreProc, except that it doesn't binarise the pages, only runs the wiper. Hence it is designed for books which have been prebinarised. queuePreNoWipe This queue works the same as queuePreProc, except that it doesn'T wipe the pages, only runs the binarisation. It is designed for books which don't have tricky gutters or similar noise around the edges, but do have marginal content which might be inadventently removed by the wiper. queueOcrPage This queue contains the path of individual pages, optionally followed by a space and the name of the training to use. Each page is OCRed, and the results are uploaded to S3. After each page is OCRed, a check is made to see whether all pages that look like they were preprocessed have corresponding .hocr files. If so, the bookname is added to the queueAnalyse queue. queueAnalyse A message on the queueAnalyse queue contains only a book name. The confidences for each page are calculated and saved in the 'conf' file, and the best version of each page is decided upon and saved in the 'best' file. PDFs are then generated, and the confidence graph is generated. The queues should generally only be messed with by the bookpipeline and booktopipeline tools, but if you're feeling ambitious you can take a look at the `addtoqueue` tool. Remember that messages in a queue are hidden for a few minutes when they are read, so for example you couldn't straightforwardly delete a message which was currently being processed by a server, as you wouldn't be able to see it. At present the bookpipeline has some silly limitations of file names for book pages to be recognised. This is something which will be fixed in due course. While bookpipeline was built with cloud based operation in mind, there is also a local mode that can be used to run OCR jobs from a single computer, with all the benefits of preprocessing, choosing the best threshold for each image, graph creation, PDF creation, and so on that the pipeline provides. Several of the commands accept a `-c local` flag for local operation, but now there is also a new command, named rescribe, that is designed to make things much simpler for people just wanting to do some OCR on their local computer. Note that the local mode is not as well tested as the core cloud modes; please report any bugs you find with it.
Package que-go is a fully interoperable Golang port of Chris Hanks' Ruby Que queueing library for PostgreSQL. Que uses PostgreSQL's advisory locks for speed and reliability. See the original Que documentation for more details: https://github.com/chanks/que Because que-go is an interoperable port of Que, you can enqueue jobs in Ruby (i.e. from a Rails app) and write your workers in Go. Or if you have a limited set of jobs that you want to write in Go, you can leave most of your workers in Ruby and just add a few Go workers on a different queue name. Instead of using database/sql and the more popular pq PostgreSQL driver, this package uses the pgx driver: https://github.com/jackc/pgx Because Que uses session-level advisory locks, we have to hold the same connection throughout the process of getting a job, working it, deleting it, and removing the lock. Pq and the built-in database/sql interfaces do not offer this functionality, so we'd have to implement our own connection pool. Fortunately, pgx already has a perfectly usable one built for us. Even better, it offers better performance than pq due largely to its use of binary encoding. que-go relies on prepared statements for performance. As of now these have to be initialized manually on your connection pool like so: If you have suggestions on how to cleanly do this automatically, please open an issue! Here is a complete example showing worker setup and two jobs enqueued, one with a delay:
Package gue implements Golang queues on top of PostgreSQL. It uses transaction-level locks for concurrent work. Package supports several PostgreSQL drivers using adapter interface internally. Currently, adapters for the following drivers have been implemented: Here is a complete example showing worker setup for pgx/v4 and two jobs enqueued, one with a delay:
Package bqueue is a "in memory" queue. it Collects jobs via the `CollectJob` func who takes a interface `Job`
Package parallelizer is a library for enabling the addition of controlled parallelization utilizing a pool of worker goroutines in a simple manner. This is not intended as an external job queue, where outside programs may submit jobs, although it could easily be used to implement such a tool. The parallelizer package provides a Runner interface, which is for client applications to implement. Instances of the Runner interface may then be passed to the constructor functions NewSynchronousWorker or NewGoWorker, which construct objects conforming to the Worker interface. Data items may then be passed to the Worker instances via the Worker.Call method, and the processing completed and the final result obtained by calling Worker.Wait. A Runner implementation must provide a Runner.Run method, which will actually process the data in a goroutine and return a result; the result is then passed to the Runner.Integrate method, which is run synchronously with other Runner.Integrate calls, and which can submit additional data items for processing. Once all data is processed, and the client code has called Worker.Wait, the Worker will call the Runner.Result method to obtain the result. The Runner.Result method will be called exactly once; the returned value is cached in the Worker to be returned by future calls to Worker.Wait. The Worker.Call method may not be called again after Worker.Wait has been called. The parallelizer package also provides a Doer interface, which is for client applications to implement. Instances of the Doer interface may then be passed to the constructor function NewSerializer, which constructs objects conforming to the Serializer interface. Data items may then be passed to the Serializer objects to be executed via Doer.Do in a synchronous manner without necessarily blocking the calling goroutine. A Doer implementation must provide a Doer.Do method, which will actually process the data in a separate goroutine; each call will be executed synchronously, so thread-safety in Doer.Do is not a concern. When the code using the wrapping Serializer is done, it will call Serializer.Wait, which will call the Doer.Finish method and return its result to the caller. Note that none of the Serializer.Call methods may be called again after calling Serializer.Wait.
Package goworker is a Resque-compatible, Go-based background worker. It allows you to push jobs into a queue using an expressive language like Ruby while harnessing the efficiency and concurrency of Go to minimize job latency and cost. goworker workers can run alongside Ruby Resque clients so that you can keep all but your most resource-intensive jobs in Ruby. To create a worker, write a function matching the signature and register it using Here is a simple worker that prints its arguments: To create workers that share a database pool or other resources, use a closure to share variables. goworker worker functions receive the queue they are serving and a slice of interfaces. To use them as parameters to other functions, use Go type assertions to convert them into usable types. For testing, it is helpful to use the redis-cli program to insert jobs onto the Redis queue: will insert 100 jobs for the MyClass worker onto the myqueue queue. It is equivalent to: After building your workers, you will have an executable that you can run which will automatically poll a Redis server and call your workers as jobs arrive. There are several flags which control the operation of the goworker client. -queues="comma,delimited,queues" — This is the only required flag. The recommended practice is to separate your Resque workers from your goworkers with different queues. Otherwise, Resque worker classes that have no goworker analog will cause the goworker process to fail the jobs. Because of this, there is no default queue, nor is there a way to select all queues (à la Resque's * queue). Queues are processed in the order they are specififed. If you have multiple queues you can assign them weights. A queue with a weight of 2 will be checked twice as often as a queue with a weight of 1: -queues='high=2,low=1'. -interval=5.0 — Specifies the wait period between polling if no job was in the queue the last time one was requested. -concurrency=25 — Specifies the number of concurrently executing workers. This number can be as low as 1 or rather comfortably as high as 100,000, and should be tuned to your workflow and the availability of outside resources. -connections=2 — Specifies the maximum number of Redis connections that goworker will consume between the poller and all workers. There is not much performance gain over two and a slight penalty when using only one. This is configurable in case you need to keep connection counts low for cloud Redis providers who limit plans on maxclients. -uri=redis://localhost:6379/ — Specifies the URI of the Redis database from which goworker polls for jobs. Accepts URIs of the format redis://user:pass@host:port/db or unix:///path/to/redis.sock. The flag may also be set by the environment variable $($REDIS_PROVIDER) or $REDIS_URL. E.g. set $REDIS_PROVIDER to REDISTOGO_URL on Heroku to let the Redis To Go add-on configure the Redis database. -namespace=resque: — Specifies the namespace from which goworker retrieves jobs and stores stats on workers. -exit-on-complete=false — Exits goworker when there are no jobs left in the queue. This is helpful in conjunction with the time command to benchmark different configurations. -use-number=false — Uses json.Number when decoding numbers in the job payloads. This will avoid issues that occur when goworker and the json package decode large numbers as floats, which then get encoded in scientific notation, losing pecision. This will default to true soon. You can also configure your own flags for use within your workers. Be sure to set them before calling goworker.Main(). It is okay to call flags.Parse() before calling goworker.Main() if you need to do additional processing on your flags. To stop goworker, send a QUIT, TERM, or INT signal to the process. This will immediately stop job polling. There can be up to $CONCURRENCY jobs currently running, which will continue to run until they are finished. Like Resque, goworker makes no guarantees about the safety of jobs in the event of process shutdown. Workers must be both idempotent and tolerant to loss of the job in the event of failure. If the process is killed with a KILL or by a system failure, there may be one job that is currently in the poller's buffer that will be lost without any representation in either the queue or the worker variable. If you are running Goworker on a system like Heroku, which sends a TERM to signal a process that it needs to stop, ten seconds later sends a KILL to force the process to stop, your jobs must finish within 10 seconds or they may be lost. Jobs will be recoverable from the Redis database under as a JSON object with keys queue, run_at, and payload, but the process is manual. Additionally, there is no guarantee that the job in Redis under the worker key has not finished, if the process is killed before goworker can flush the update to Redis.