Abraxas
A gearman client, worker and server module implemented on top of gearman-protocol
for full end-to-end streaming support.
Synopsis
var Gearman = require('abraxas');
var client = Gearman.Client.connect({ servers: ['127.0.0.1:4730'], defaultEncoding:'utf8' });
client.registerWorker("toUpper", function(task) {
return task.payload.toUpperCase();
});
client.registerWorker("toUpper", function(task) {
task.end(task.payload.toUpperCase());
});
var through = require('through2');
client.registerWorkerStream("toUpper", function(task) {
task.pipe(through(function(data,enc,done) { this.push(data.toUpperCase(),enc); done() })).pipe(task);
});
client.submitJob('toUpper', 'test string', function(error, result) {
if (error) console.error(error);
console.log("Upper:", result);
});
client.submitJob('toUpper', 'test string').then(function (result) {
console.log("Upper:", result);
});
client.submitJob('toUpper', 'test string').pipe(process.stdout);
process.stdin.pipe(client.submitJob('toUpper')).pipe(process.stdout);
Purpose
Abraxas is aiming to be a streaming Gearman client/worker/server
implementation for Node.js. It's built with an eye toward the ease of use
of the API for end users. This means supporting streams and promises in an
intuitive and transparent fashion, in addition to a traditional callback
based API.
The Abraxas server implementation:
- Aims to both provide a much easier to install Gearman server. (The C++
version requires recent versions of Boost.)
- Allow for apps using Gearman for APIs to be entirely self contained when
an external Gearman server is not provided.
- Act as a test bed for experimental features--
- Fully functional SUBMIT_JOB_EPOCH and SUBMIT_JOB_SCHED implementations
- Client streaming
- Background job queue replication to support redudency across servers
API
Connecting
var Gearman = require('abraxas');
var client = Gearman.Client.connect({ host:'127.0.0.1', port:4730, defaultEncoding:'utf8' });
var client = Gearman.Client.connect({ servers:[{host:'127.0.0.1', port:4730}], defaultEncoding:'utf8' });
var client = Gearman.Client.connect({ servers:['127.0.0.1:4730'], defaultEncoding:'utf8' });
-
var client = Gearman.Client([options][,callback])
options (optional) is an object with properties of:
-
Connection options:
- host (default: 127.0.0.1)
- port (default: 4730)
OR
- servers -- An array of gearman servers to connect to. These can either
be objects with host and port (and any of the other options below, to set
them per-server), or a string of 'host:port'.
-
streaming (default: false) -- Requests the Abraxas server's streaming
mode. Makes workers streaming data back over WORK_DATA safe. If you
request this with the C++ gearmand you'll get a connection error.
-
defaultEncoding (default: buffer) -- The stream encoding to use for
client and worker payloads, unless otherwise specified.
-
maxJobs (default: 1) -- The maximum number of simultaneous jobs
that workers on this connection will execute. This is the concurrency tuning
paramter for registerWorker. This has no effect on the number of jobs handled
via submitJob.
-
submitTimeout (default: ∞) Default time to wait for a gearman server
to become available when submitting a job. Jobs who fail with this
timeout are guaranteed to not have been submitted to any servers or workers.
-
responseTimeout (default: ∞) Default time to wait for a job to complete.
-
debug -- If true, unknown or unexpected packets will be logged with
console.error. You can achieve the same result by listening for the
'unknown-packet' event.
-
trafficDump -- If true, emits read and write events for the raw
buffers being sent over the wire. If no listeners for these events
are configured the buffers will be printed with console.error.
-
packetDump -- If true, behaves the same as trafficDump but instead emits
the parsed packets.
callback (optional) will be called once a connection is established to
any of the servers. There is, however, no requirement that you wait for the
connection-- any commands issued prior to the connection being established
will be buffered.
-
Reconnection and Multiple Servers
Abraxas will begin attempting to connect to the servers its started with
immediately. If a connection drops, it will attempt to reconnect
automatically. Connections back off following the fibonacci sequence with a
maximum delay of 10 seconds between retries and a randomization factor of
15%.
Details on how each command handles multiple server connections is detailed
in the command's documentation. Generally however, worker related commands
go to ALL available servers and all future servers. As such, worker related
commands return immediately. By contrast, client related commands go to ANY
ONE server and if no server is available they'll wait for a connection to be
established.
-
Streaming Mode
The Abraxas server supports "streaming" mode which modifies the semantics to
support streaming clients. (See the included SEMANTICS
document.)
Specifically, when the worker is in streaming mode:
-
If it disconnects in the middle of a job, the server will send a
WORK_FAIL response instead of requeing the job.
-
Writes are not buffered and are immediately sent with WORK_DATA packets.
(Ordinarily writes are buffered and only sent when the worker ends with
a WORK_COMPLETE packet.)
When the client is in streaming mode:
- Submitting uniqueid foreground jobs is an error and your job will
immediately fail without being submitted.
-
client.on('connect', function(client) { ... })
Called after a connection is established. NOTE: This can and will be
called more than once.
-
client.on('disconnect', function(client) { ... })
Called after the connection drops for any reason.
-
client.on('connection-error', function(error,client) { ... })
Called any time there's an error in one of the connections. This is not
fatal and connections will be automatically reestablished.
-
client.disconnect()
Disconnects the client after flushing the current buffer.
-
var task = client.echo([options][,data][,callback])
Sends data to the server which the server then sends back. This is
useful as a "ping" type utility to verify that the connection is still
live and the server responding.
options (optional) is an object with properties of:
- encoding (default: client.options.defaultEncoding) -- This is the
stream encoding to use for the data and the response.
- accept -- This is the options to pass to the response stream constructor.
- transmit -- This is the options to pass to the payload stream constructor.
data (optional) is a buffer or string to get echoed back to you by the
server. If data is passed in then the task cannot be written to.
callback (optional) is a function (err, data) that will be called
with the result from the server. If the callback is passed in then the
task cannot be read from.
-
var task = client.getStatus(jobid[,callback])
Fetches the status of a running job. This task is read only-- if you read
from it as a stream, it will emit the status object. This runs against
all connected servers.
callback (optional) is a function (err, status) that will be called
with the result from the server; see details on the status object
below. If the callback is passed in then the task cannot be read from.
The status object has the following properties:
- known Number of servers that know about the job, typically 1 or 0.
- running Number of servers that are running the job, typically 1 or 0.
- complete Percent job completion, if the job has been updating its status.
-
client.setClientId(id)
Sets the id for this connection to the arbitrary string you provide. This
is returned by the workers command.
Tasks
Client API calls return Task objects and Workers are passed Tasks when new
work is acquired. Tasks are duplex streams. Tasks also proxy to bluebird
Promises.
With client Tasks, data written to the stream is sent as the payload of the
job. When reading from a stream, the result from the worker is read.
With worker Tasks, this is reversed-- data read from the stream is the
payload, data written to the stream is the result.
Tasks have a jobid
property. On client Tasks this won't be set until the
created
event is emitted.
When a task is the result of submitting a job, it will emit a created
event when we've been notified that the server has accepted the job.
Exceptions / failures from the worker will be emitted as error
events.
Warnings from the worker will be emitted as warn
events with a single
string argument containing the warning.
Status updates from the worker will be emitted as status
events with
percentage completion as the argument.
Using a task as a promise will result in the promise being resolved with the
concatenated value of the stream. Exceptions and job failures will result
in the promise being rejected.
Client
-
var task = client.submitJob(func[,options][,data][,callback])
Submit a job to the gearman server-- write to the task
to send your
payload. As described above, the task can be read from as a stream to
retreive your result, or you can use it as a promise with .then
to get
its value. Tasks can also emit error
, warn
and status
events, see
the tasks section for details.
Jobs are submitted to only one server. The server to use is selected on a
round-robin basis amongst active connections.
func The name of the function you want to call.
options (optional) is an object with properties of:
- priority (default: null; normal priority) -- Can be
high
or low
,
these effect the priority of this item in the job queue when there's a backlog.
(Note: Exact semantics are determined by the gearman server, so you'll need
to check its documentation.) - encoding (default: client.options.defaultEncoding) -- The
stream encoding to use for the response stream.
- accept -- This is the options to pass to the response stream constructor.
- transmit -- This is the options to pass to the payload stream constructor.
data (optional) is the payload to be submitted to the func worker.
If it is passed in the task cannot be written to.
callback (optional) is a function (err, data) that will be called
with the result from the worker.
-
var task = client.submitJobBg(func[,options][,data][,callback])
Submit a background job to the gearman server. This is a job where you
don't care about the result. You can disconnect from the server and the
job will still be executed. The result of the task is the jobid
the
task was created with.
Jobs are submitted to only one server. The server to use is selected on a
round-robin basis amongst active connections.
func The name of the function you want to call.
options (optional) is an object with properties of:
- priority (default: null; normal priority) -- Can be
high
or low
,
these effect the priority of this item in the job queue when there's a backlog.
(Note: Exact semantics are determined by the gearman server, so you'll need
to check its documentation.) - encoding (default: client.options.defaultEncoding) -- The
stream encoding to use for the response stream.
- accept -- This is the options to pass to the response stream constructor.
- transmit -- This is the options to pass to the payload stream constructor.
data (optional) is the payload to be submitted to the func worker.
If it is passed in the task cannot be written to.
callback (optional) is a function (err, jobid) that will be called
with the jobid.
-
var task = client.submitJobAt(func,date[,options][,data][,callback])
EXPERIMENTAL. Submit a background job to happen at a specific time.
Jobs are submitted to only one server. The server to use is selected on a
round-robin basis amongst active connections.
func The name of the function you want to call.
date Either a Date
object or a unix epoch time (seconds since 1970).
options (optional) is an object with properties of:
- encoding (default: client.options.defaultEncoding) -- The
stream encoding to use for the response stream.
- accept -- This is the options to pass to the response stream constructor.
- transmit -- This is the options to pass to the payload stream constructor.
data (optional) is the payload to be submitted to the func worker.
If it is passed in the task cannot be written to.
callback (optional) is a function (err, jobid) that will be called
with the jobid.
-
var task = client.submitJobSched(func,schedule[,options][,data][,callback])
WARNING: Not implemented in any existing gearman server, but in the protocol documentation.
Submit a background job to happen on a schedule
Jobs are submitted to only one server. The server to use is selected on a
round-robin basis amongst active connections.
func The name of the function you want to call.
schedule is an object with properties of:
options (optional) is an object with properties of:
- encoding (default: client.options.defaultEncoding) -- The
stream encoding to use for the response stream.
- accept -- This is the options to pass to the response stream constructor.
- transmit -- This is the options to pass to the payload stream constructor.
data (optional) is the payload to be submitted to the func worker.
If it is passed in the task cannot be written to.
callback (optional) is a function (err, jobid) that will be called
with the jobid.
Worker
-
var worker = client.registerWorkerStream(func[,options],workercb)
-
var worker = client.registerWorker(func[,options],workercb)
Register a handler for func. workercb is passed a task when a
client submits a job.
With registerWorker
, the task will have a payload property. With
registerWorkerStream
the task a stream that can be read from in the
usual ways to get the payload. See the section on Tasks for details.
Writing to the task will send that as the response to the client. What's
written will be buffered and sent as a WORK_COMPLETE packet, unless you
connected with the streaming option, in which case WORK_DATA packets
will be sent as data is written.
Functions will be registered on ALL servers. Any servers connected or
reconnected to later will have functions reregistered with them.
If you emit an error event this will result in a WORK_EXCEPTION packet if
supported by the server, otherwise it will emit a WORK_WARNING followed by
a WORK_FAIL.
If you throw an exception, it will result in a WORK_EXCEPTION packet.
If you return a value, the job will be completed with that value. If you
return a stream, that stream will be piped to the client as the result.
If you return a promise, that promise will be resolved and its resolved
value will be treated as above.
If you don't return anything then you're expected to have written to the
task yourself.
func The name of the function that we'll handle.
options (optional) is an object with properties of:
- timeout (default: none) If included, instructs the server that if more
then timeout seconds pass without the work completing then it should
give up and resubmit the work for processing by a different worker.
- encoding (default: client.options.defaultEncoding) -- The stream
encoding to use for the response stream.
- accept -- This is the options to pass to the response stream constructor.
- transmit -- This is the options to pass to the payload stream constructor.
workercb is a function (task)
that's called when there's new work to do. The task
object has following additional methods:
- task.status(percent) A float that represents how much work has been
completed so far (as a decimal, eg, .25 = 25%).
- task.warn(msg) Where
msg
is a buffer, string or a stream. This
warning will be sent to the client. (Clients interpret msg
as
strings.) - task.end(data) Completes the task, sending a WORK_COMPLETE packet with
data
. data
can be a buffer, string or stream. - task.error(data) Fails the task, sending a WORK_FAIL packet with
data
. data
can be a buffer, string or stream. Also sends either a
WORK_EXCEPTION or WORK_WARNING with he same data before WORK_FAIL
depending on wether the server supports the exception extension.
The worker object returned has the property:
- function (value) The name of the funciton this worker object is tied to.
And methods:
-
var task = worker.unregister() Short cut for client.unregisterWorker(worker.function)
-
var task = worker.maxqueue([maxsize][,callback]) Short cut for client.maxqueue(worker.function,maxsize,callback)
-
var task = worker.status() Resolves with a status object with the properties:
- inqueue The number of jobs in queue for this function.
- running How many of those jobs are currently running.
- workers How many workers are available to run jobs. (Sometimes low,
due to workers being able to handle multiple jobs simultaneously.)
-
var task = client.maxqueue(func[,maxsize][,callback])
Sets the maximum number of jobs that may be queued at one time for a
specific function. Like other worker functions, this is sent to all
server connections, current and future.
func is the function to set or clear this limit of.
maxsize (default: unlimited) is the maximum number of jobs to be queued at a time for this funciton.
callback (optional) is a function (err)
-
client.unregisterWorker(func)
Notifies all servers that we are no longer handling requests for the func job.
func The name of the function unregister.
-
client.forgetAllWorkers()
Tells all the server that we are no longer handling any functions at all.
Admin
-
var task = client.status([callback])
Fetches the current status of all functions that all connected gearman
servers are aware of. If no gearman servers are connected then it will
immediately return an empty object. It is resolved with a functionstatus object.
callback (optional) is a function (functionstatus)
.
The functionstatus object is keyed on function name and has values
that are objects with the properties:
- function - The name of a function.
- inqueue The number of jobs in queue for this function.
- running How many of those jobs are currently running.
- workers How many worker connections are available to run jobs.
(Sometimes lower than running, due to workers being able to handle
multiple jobs per connection.)
-
var task = client.workers([callback])
Fetches a list of all connections and what workers, if any, they have
registered. It is resolved with a workerlist array.
callback (optional) is a function (workerlist)
.
The workerlist array is made up of objects with the properties:
- server An object with either host and port properties or a path
property describing how we connected to the server this worker is
associated with.
- fd The file descriptor of this connection on the server.
- ip The ip address that the connection came from.
- clientid The client id of the connection, if any. Defaults to null.
- functions An array of all of the function names.
-
var task = client.shutdown([gracefully][,callback])
Requests that all connected servers shutdown. If you wanted to shutdown a
bunch of gearman servers you might do this:
['server1:port','server2:port','server3:port'].forEach(function(server){
Gearman.Client.connect({servers:[server]},function(err,client) {
client.shutdown(true,function(err){
if (err) console.error(err);
client.disconnect();
});
});
});
gracefully (default: false) If true, stops listening for new connections but waits for running jobs to complete before shutting down.
callback (optional) is a function (err)
- var task = client.getpid([callback])
TO BE IMPLEMENTED
- var task = client.createfunction(func,[callback])
TO BE IMPLEMENTED
- var task = client.dropfunction(func,[callback])
TO BE IMPLEMENTED
- var task = client.canceljob(jobid)
TO BE IMPLEMENTED
- var task = client.getjobs()
TO BE IMPLEMENTED
- var task = client.getuniquejobs()
TO BE IMPLEMENTED
Server
var Gearman = require('abraxas');
Gearman.Server.listen({port: 4730});
WARNING
The server is known to have memory leaks.
TO BE DOCUMENTED
But really, that above is about all there is to it right now. It takes the same
types of debugging options as the client, eg, trafficDump, packetDump. It should
work, but see TODO.md.
Glossary
- server - A Gearman server instance. This is responsible for queueing
jobs from clients, dispatching them to workers and sending results
back to clients.
- function - A kind of work that the Gearman is aware of that workers
may do. It can become aware of a new function in three ways, first, a
worker could register to handle one, second a client could request
one be run or third via the createfunction admin command.
- payload - Functions only receive one argument, the payload, and this is
it. The protocol leaves it an unspecified blob, it's up to you to impose
more structure. With this library, setting an encoding will get you
strings.
- job - Jobs are the record in the server that work needs to be done.
- task - Tasks the client and worker side representation of work to be
completed. They hold job information and provide stream and promise
interfaces.
It's worth noting that all other gearman libraries I'm aware of run client
and worker commands through their own classes and via their own connections.
There's no technical reason for this, and this library does not make that
distinction-- you can submit jobs and register workers from the same
connection and so one program cn be both client and worker.
- client - Any program that connects to the gearman server and submits jobs.
- worker - Any program that connects to the gearman server and registers workers.
What's not here
The various undocumented extensions to the protocol that the C++ gearmand
(from gearman.org) has introduced. That is, the TO BE IMPLEMENTED admin
commands, fetching status by unique id and the explicit support for reduce
jobs. The last, I'm dubious about the utility of. You could already
implement map/reduce with gearman trivially and extending the protocol
doesn't seem to gain anything other than complexity.
See the TODO document for details on other things I'd like to add.