Job Queues

github.com/benmanns/goworker

Package goworker is a Resque-compatible, Go-based background worker. It allows you to push jobs into a queue using an expressive language like Ruby while harnessing the efficiency and concurrency of Go to minimize job latency and cost. goworker workers can run alongside Ruby Resque clients so that you can keep all but your most resource-intensive jobs in Ruby. To create a worker, write a function matching the signature and register it using Here is a simple worker that prints its arguments: To create workers that share a database pool or other resources, use a closure to share variables. goworker worker functions receive the queue they are serving and a slice of interfaces. To use them as parameters to other functions, use Go type assertions to convert them into usable types. For testing, it is helpful to use the redis-cli program to insert jobs onto the Redis queue: will insert 100 jobs for the MyClass worker onto the myqueue queue. It is equivalent to: After building your workers, you will have an executable that you can run which will automatically poll a Redis server and call your workers as jobs arrive. There are several flags which control the operation of the goworker client. -queues="comma,delimited,queues" — This is the only required flag. The recommended practice is to separate your Resque workers from your goworkers with different queues. Otherwise, Resque worker classes that have no goworker analog will cause the goworker process to fail the jobs. Because of this, there is no default queue, nor is there a way to select all queues (à la Resque's * queue). Queues are processed in the order they are specififed. If you have multiple queues you can assign them weights. A queue with a weight of 2 will be checked twice as often as a queue with a weight of 1: -queues='high=2,low=1'. -interval=5.0 — Specifies the wait period between polling if no job was in the queue the last time one was requested. -concurrency=25 — Specifies the number of concurrently executing workers. This number can be as low as 1 or rather comfortably as high as 100,000, and should be tuned to your workflow and the availability of outside resources. -connections=2 — Specifies the maximum number of Redis connections that goworker will consume between the poller and all workers. There is not much performance gain over two and a slight penalty when using only one. This is configurable in case you need to keep connection counts low for cloud Redis providers who limit plans on maxclients. -uri=redis://localhost:6379/ — Specifies the URI of the Redis database from which goworker polls for jobs. Accepts URIs of the format redis://user:pass@host:port/db or unix:///path/to/redis.sock. The flag may also be set by the environment variable $($REDIS_PROVIDER) or $REDIS_URL. E.g. set $REDIS_PROVIDER to REDISTOGO_URL on Heroku to let the Redis To Go add-on configure the Redis database. -namespace=resque: — Specifies the namespace from which goworker retrieves jobs and stores stats on workers. -exit-on-complete=false — Exits goworker when there are no jobs left in the queue. This is helpful in conjunction with the time command to benchmark different configurations. -use-number=false — Uses json.Number when decoding numbers in the job payloads. This will avoid issues that occur when goworker and the json package decode large numbers as floats, which then get encoded in scientific notation, losing pecision. This will default to true soon. You can also configure your own flags for use within your workers. Be sure to set them before calling goworker.Main(). It is okay to call flags.Parse() before calling goworker.Main() if you need to do additional processing on your flags. To stop goworker, send a QUIT, TERM, or INT signal to the process. This will immediately stop job polling. There can be up to $CONCURRENCY jobs currently running, which will continue to run until they are finished. Like Resque, goworker makes no guarantees about the safety of jobs in the event of process shutdown. Workers must be both idempotent and tolerant to loss of the job in the event of failure. If the process is killed with a KILL or by a system failure, there may be one job that is currently in the poller's buffer that will be lost without any representation in either the queue or the worker variable. If you are running Goworker on a system like Heroku, which sends a TERM to signal a process that it needs to stop, ten seconds later sends a KILL to force the process to stop, your jobs must finish within 10 seconds or they may be lost. Jobs will be recoverable from the Redis database under as a JSON object with keys queue, run_at, and payload, but the process is manual. Additionally, there is no guarantee that the job in Redis under the worker key has not finished, if the process is killed before goworker can flush the update to Redis.

v0.1.3 • 8 years ago

gopkg.in/benmanns/goworker.v0

Package goworker is a Resque-compatible, Go-based background worker. It allows you to push jobs into a queue using an expressive language like Ruby while harnessing the efficiency and concurrency of Go to minimize job latency and cost. goworker workers can run alongside Ruby Resque clients so that you can keep all but your most resource-intensive jobs in Ruby. To create a worker, write a function matching the signature and register it using Here is a simple worker that prints its arguments: To create workers that share a database pool or other resources, use a closure to share variables. goworker worker functions receive the queue they are serving and a slice of interfaces. To use them as parameters to other functions, use Go type assertions to convert them into usable types. For testing, it is helpful to use the redis-cli program to insert jobs onto the Redis queue: will insert 100 jobs for the MyClass worker onto the myqueue queue. It is equivalent to: After building your workers, you will have an executable that you can run which will automatically poll a Redis server and call your workers as jobs arrive. There are several flags which control the operation of the goworker client. -queues="comma,delimited,queues" — This is the only required flag. The recommended practice is to separate your Resque workers from your goworkers with different queues. Otherwise, Resque worker classes that have no goworker analog will cause the goworker process to fail the jobs. Because of this, there is no default queue, nor is there a way to select all queues (à la Resque's * queue). Queues are processed in the order they are specififed. If you have multiple queues you can assign them weights. A queue with a weight of 2 will be checked twice as often as a queue with a weight of 1: -queues='high=2,low=1'. -interval=5.0 — Specifies the wait period between polling if no job was in the queue the last time one was requested. -concurrency=25 — Specifies the number of concurrently executing workers. This number can be as low as 1 or rather comfortably as high as 100,000, and should be tuned to your workflow and the availability of outside resources. -connections=2 — Specifies the maximum number of Redis connections that goworker will consume between the poller and all workers. There is not much performance gain over two and a slight penalty when using only one. This is configurable in case you need to keep connection counts low for cloud Redis providers who limit plans on maxclients. -uri=redis://localhost:6379/ — Specifies the URI of the Redis database from which goworker polls for jobs. Accepts URIs of the format redis://user:pass@host:port/db or unix:///path/to/redis.sock. The flag may also be set by the environment variable $($REDIS_PROVIDER) or $REDIS_URL. E.g. set $REDIS_PROVIDER to REDISTOGO_URL on Heroku to let the Redis To Go add-on configure the Redis database. -namespace=resque: — Specifies the namespace from which goworker retrieves jobs and stores stats on workers. -exit-on-complete=false — Exits goworker when there are no jobs left in the queue. This is helpful in conjunction with the time command to benchmark different configurations. -use-number=false — Uses json.Number when decoding numbers in the job payloads. This will avoid issues that occur when goworker and the json package decode large numbers as floats, which then get encoded in scientific notation, losing pecision. This will default to true soon. You can also configure your own flags for use within your workers. Be sure to set them before calling goworker.Main(). It is okay to call flags.Parse() before calling goworker.Main() if you need to do additional processing on your flags. To stop goworker, send a QUIT, TERM, or INT signal to the process. This will immediately stop job polling. There can be up to $CONCURRENCY jobs currently running, which will continue to run until they are finished. Like Resque, goworker makes no guarantees about the safety of jobs in the event of process shutdown. Workers must be both idempotent and tolerant to loss of the job in the event of failure. If the process is killed with a KILL or by a system failure, there may be one job that is currently in the poller's buffer that will be lost without any representation in either the queue or the worker variable. If you are running Goworker on a system like Heroku, which sends a TERM to signal a process that it needs to stop, ten seconds later sends a KILL to force the process to stop, your jobs must finish within 10 seconds or they may be lost. Jobs will be recoverable from the Redis database under as a JSON object with keys queue, run_at, and payload, but the process is manual. Additionally, there is no guarantee that the job in Redis under the worker key has not finished, if the process is killed before goworker can flush the update to Redis.

v0.1.3 • 8 years ago

github.com/VertebrateResequencing/wr

Package main is a stub for wr's command line interface, with the actual implementation in the cmd package. wr is a workflow runner. You use it to run the commands in your workflow easily, automatically, reliably, with repeatability, and while making optimal use of your available computing resources. wr is implemented as a polling-free in-memory job queue with an on-disk acid transactional embedded database, written in go. Its main benefits over other software workflow management systems are its very low latency and overhead, its high performance at scale, its real-time status updates with a view on all your workflows on one screen, its permanent searchable history of all the commands you have ever run, and its "live" dependencies enabling easy automation of on-going projects. Start up the manager daemon, which gives you a url you can view the web interface on: In addition to the "local" scheduler, which will run your commands on all available cores of the local machine, you can also have it run your commands on your LSF cluster or in your OpenStack environment (where it will scale the number of servers needed up and down automatically). Now, stick the commands you want to run in a text file and: Arbitrarily complex workflows can be formed by specifying command dependencies. Use the --help option of `wr add` for details. wr's core is implemented in the queue package. This is the in-memory job queue that holds commands that still need to be run. Its multiple sub-queues enable certain guarantees: a given command will only get run by a single client at any one time; if a client dies, the command will get run by another client instead; if a command cannot be run, it is buried until the user takes action; if a command has a dependency, it won't run until its dependencies are complete. The jobqueue package provides client+server code for interacting with the in-memory queue from the queue package, and by storing all new commands in an on-disk database, provides an additional guarantee: that (dynamic) workflows won't break because a job that was added got "lost" before it got run. It also retains all completed jobs, enabling searching through of past workflows and allowing for "live" dependencies, triggering the rerunning of previously completed commands if their dependencies change. The jobqueue package is also what actually does the main "work" of the system: the server component knows how many commands need to be run and what their resource requirements (memory, time, cpus etc.) are, and submits the appropriate number of jobqueue runner clients to the job scheduler. The jobqueue/scheduler package has the scheduler-specific code that ensures that these runner clients get run on the configured system in the most efficient way possible. Eg. for LSF, if we have 10 commands that need 2GB of memory to run, we will submit a job array of size 10 with 2GB of memory reservation to LSF. The most limited (and therefore potentially least contended) queue capable of running the commands will be chosen. For OpenStack, the cheapest server (in terms of cores and memory) that can run the commands will be spawned, and once there is no more work to do on those servers, they get terminated to free up resources. The cloud package implements methods for interacting with cloud environments such as OpenStack. The corresponding jobqueue/scheduler package uses these methods to do their work. The static subdirectory contains the html, css and javascript needed for the web interface. See jobqueue/serverWebI.go for how the web interface backend is implemented. The internal package contains general utility functions, and most notably config.go holds the code for how the command line interface deals with config options.

v0.36.0 • 3 months ago

code.nkcmr.net/ed

Package ed implements an event dispatcher that is meant to allow codebases to organize concerns around events. Instead of placing all concerns about a particular business logic event in one function, this package allows you to declare events by type and dispatach those events to other code so that other concerns may be maintained in separate areas of the codebase. This module is tagged as v0, thus complies with Go's definition and rules about v0 modules (https://go.dev/doc/modules/version-numbers#v0-number). In short, it means that the API of this module may change without incrementing the major version number. Each releasable version will simply increment the patch number. Given the surface area of this module is quite small, this should not be a huge issue if used in production code. ed uses Go's own type system to dispatch events to the correct event handlers. By simply registering an event handler that accepts an event of a particular type: That handler will be triggered when an event of the same type is dispatched from somewhere else in the program: Because it is Go's type system, event handlers can also be registered against interfaces instead of just concrete types. Thus, creating an event handler that will be triggered for all events is as simple as: Overall, this package's goal is to mostly be an aid in allowing application business logic to remain clear of ancillary conerns/activities that might otherwise pile up in certain code areas. For example, as a SaaS company grows, a lot of cross-cutting concerns can build up around a customer signing up for a service. In a monolithic codebase/service, the code that handles a new customer signup would become full of "do this check", "send this data to marketing systems", "screen this signup for abusive behavior" logic. This is where ed is meant to be: taking all of those concerns/side-effects that are ancillary to the core act of signing up a user, and placing them in there own code areas and allowing them to be communicated with via events. This package is not trying to be a job queue, or a messaging system. The dispatching of events is done synchronously and the events are designed to allow errors to be returned so that use-cases like synchronous validation are supported. That being said, a common use-case might be to use this package to allow the publishing of events/messages to a Kafka/SQS-like system, an event handler: Then in your main business logic for user signups:

v0.0.2 • last year

rescribe.xyz/bookpipeline

The bookpipeline package contains various tools and functions for the OCR of books, with a focus on distributed OCR using short-lived virtual servers. It also contains several tools that are useful standalone; read the accompanying README for more details. The book pipeline is a way to split the different processes that for book OCR into small jobs, which can be processed when a computer is ready for them. It is currently implemented with Amazon's AWS cloud systems, and can scale from zero to many computers, with jobs being processed faster when more servers are available. Central to the bookpipeline in terms of software is the bookpipeline command, which is part of the rescribe.xyz/bookpipeline package. Presuming you have the go tools installed, you can install it, and useful tools to control the system, with this command: All of the tools provided in the bookpipeline package will give information on what they do and how they work with the '-h' flag, so for example to get usage information on the booktopipeline tool simply run the following: To get the pipeline tools to work for you, you'll need to change the settings in cloudsettings.go, and set up your ~/.aws/credentials appropriately. Most of the time the bookpipeline is expected to be run from potentially short-lived servers on Amazon's EC2 system. EC2 provides servers which have no guaranteed of stability (though in practice they seem to be), called "Spot Instances", which we use for bookpipeline. bookpipeline can handle a process or server being suddenly destroyed without warning (more on this later), so Spot Instances are perfect for us. We have set up a machine image with bookpipeline preinstalled which will launch at bootup, which is all that's needed to launch an bookpipeline instance. Presuming the bookpipeline package has been installed on your computer (see above), the spot instance can be started with the command: You can keep an eye on the servers (spot or otherwise) that are running, and the jobs left to do and in progress, with the "lspipeline" tool (which is also part of the bookpipeline package). It's recommended to use this with the ssh private key for the servers, so that it can also report on what each server is currently doing, but it can run successfully without it. It takes a little while to run, so be patient. It can be run with the command: Spot instances can be terminated with ssh, using their ip address which can be found with lspipeline, like so: The bookpipeline program is run as a service managed by systemd on the servers. The system is fully resiliant in the face of unexpected failures. See the section "How the pipeline works" for details on this. bookpipeline can be managed like any other systemd service. A few examples: Books can be added to the pipeline using the "booktopipeline" tool. This takes a directory of page images as input, and uploads them all to S3, adding a job to the pipeline queue to start processing them. So it can be used like this: Once a book has been finished, it can be downloaded using the "getpipelinebook" tool. This has several options to download specific parts of a book, but the default case will download the best hOCR for each page, PDFs, and the best, conf and graph.png files. Use it like this: To get the plain text from the book, use the hocrtotxt tool, which is part of the rescribe.xyz/utils package. You can get the package, and run the tool, like this: The central part of the book pipeline is several SQS queues, which contain jobs which need to be done by a server running bookpipeline. The exact content of the SQS messages vary according to each queue, as some jobs need more information than others. Each queue is checked at least once every couple of minutes on any server that isn't currently processing a job. When a job is taken from the queue by a process, it is hidden from the queue for 2 minutes so that no other process can take it. Once per minute when processing a job the process sends a message updating the queue, to tell it to keep the job hidden for two minutes. This is called the "heartbeat", as if the process fails for any reason the heartbeat will stop, and in 2 minutes the job will reappear on the queue for another process to have a go at. Once a job is completed successfully it is deleted from the queue. Queue names are defined in cloudsettings.go. queuePreProc Each message in the queuePreProc queue is a bookname, optionally followed by a space and the name of the training to use. Each page of the bookname will be binarised with several different parameters, and then wiped, with each version uploaded to S3, with the path of the preprocessed page, plus the training name if it was provided, will be added to the queueOcrPage queue. The pages are binarised with different parameters as it can be difficult to determine which binarisation level will be best prior to OCR, so several different options are used, and in the queueAnalyse step the best one is chosen, based on the confidence of the OCR output. queueWipeOnly This queue works the same as queuePreProc, except that it doesn't binarise the pages, only runs the wiper. Hence it is designed for books which have been prebinarised. queuePreNoWipe This queue works the same as queuePreProc, except that it doesn'T wipe the pages, only runs the binarisation. It is designed for books which don't have tricky gutters or similar noise around the edges, but do have marginal content which might be inadventently removed by the wiper. queueOcrPage This queue contains the path of individual pages, optionally followed by a space and the name of the training to use. Each page is OCRed, and the results are uploaded to S3. After each page is OCRed, a check is made to see whether all pages that look like they were preprocessed have corresponding .hocr files. If so, the bookname is added to the queueAnalyse queue. queueAnalyse A message on the queueAnalyse queue contains only a book name. The confidences for each page are calculated and saved in the 'conf' file, and the best version of each page is decided upon and saved in the 'best' file. PDFs are then generated, and the confidence graph is generated. The queues should generally only be messed with by the bookpipeline and booktopipeline tools, but if you're feeling ambitious you can take a look at the `addtoqueue` tool. Remember that messages in a queue are hidden for a few minutes when they are read, so for example you couldn't straightforwardly delete a message which was currently being processed by a server, as you wouldn't be able to see it. At present the bookpipeline has some silly limitations of file names for book pages to be recognised. This is something which will be fixed in due course. While bookpipeline was built with cloud based operation in mind, there is also a local mode that can be used to run OCR jobs from a single computer, with all the benefits of preprocessing, choosing the best threshold for each image, graph creation, PDF creation, and so on that the pipeline provides. Several of the commands accept a `-c local` flag for local operation, but now there is also a new command, named rescribe, that is designed to make things much simpler for people just wanting to do some OCR on their local computer. Note that the local mode is not as well tested as the core cloud modes; please report any bugs you find with it.

v1.3.0 • last year

github.com/benmanns/goworker

github.com/bgentry/que-go

github.com/cdr/amboy

github.com/mongodb/amboy

github.com/vgarvardt/gue/v5

github.com/progfay/go-job-queue

github.com/vgarvardt/gue/v3

github.com/mongodb/anser

github.com/jimmysawczuk/worker

gopkg.in/benmanns/goworker.v0

github.com/vgarvardt/gue/v4

github.com/deciduosity/amboy

github.com/VertebrateResequencing/wr

github.com/btubbs/pgq

github.com/dc0d/workerpool

github.com/dc0d/workerpool/v4

github.com/PAFFx/job-poll-queue

github.com/SergeyPanov/job-queue

github.com/vantt/go-QCoordinator

github.com/Lakshamana/job-queues

github.com/dc0d/workerpool/v5

github.com/bobstrecansky/HighPerformanceWithGo/14-clustering-and-job-queueing-in-go/jobqueues/kafka

github.com/bobstrecansky/HighPerformanceWithGo/14-clustering-and-job-queueing-in-go/jobqueues/goroutine

github.com/bobstrecansky/HighPerformanceWithGo/14-clustering-and-job-queueing-in-go/clustering/kmeans

github.com/mananbatraa/queue-job-golang-workflow

github.com/chrisdmacrae/goworker

github.com/hnakamur/etcdjobq

github.com/michaelkleyn/grpc-job-queue

gopkg.in/jobman.v0

bitbucket.org/amotus/queue

github.com/puneet105/job-queue

github.com/kevinburke/rickover

github.com/clerk/gue/v4

code.nkcmr.net/ed

github.com/fsamin/go-wsqueue

github.com/saladtechnologies/saladcloud-job-queue-worker-sdk

rescribe.xyz/bookpipeline

github.com/yuanyu90221/go_channel_with_job_queue

github.com/siangyeh8818/golang-job-queue

github.com/bkrebsbach/simple-job-queue

github.com/varungujarathi9/job-queue

github.com/jz222/rest-api-with-job-queue/api

github.com/moveaxlab/oracle-db-job-queue

github.com/phongtv-1971/go-job-queue

github.com/jz222/rest-api-with-job-queue/worker

github.com/yuanyu90221/go_job_queue

github.com/saladtechnologies/salad-cloud-job-queue-worker

github.com/flynn/que-go

github.com/vortex14/gue/v7

github.com/amongil/gh-action-push-workflow-last-job-status-to-azure-queue