TaskBatcher
Some tasks, like database inserts, are much more efficient to process in a
batch. However, we generally want our tasks to be processed "soon" even if
there's only one task. The TaskBatcher gem groups tasks by a taskname
parameter, and starts a timer when the first task comes in. After the batch
timer expires, it processes all tasks that it received in that time. (The
caller provides the block to process the tasks.)
Uses EventMachine under the hood. May be combined with Messenger for
durability guarantees.
Tested under Ruby 1.9.3 and 2.0.0.
Released under the three-clause BSD open source license.
http://opensource.org/licenses/BSD-3-Clause
See the LICENSE file.
Usage
You can either use the TaskManager module to make procedural calls, or
instantiate a BatchManager object. BatchManager objects have a cleaner API,
if you have tasks which are all processed in the same scope.
Using a BatchManager
taskname = 'db-insert'
duration = 15
callback = lambda { |result| print "The return value was #{result}.\n" }
mgr = TaskBatcher::BatchManager.new(taskname, callback, duration) do |tasks|
sql = "INSERT INTO pet_owners VALUES ("
tasks.each do |task|
sql += "( #{task[:name]}, #{task[:pet]} ), "
end
sql += ")\n"
result = (execute that SQL)
end
mgr.task name: 'Alice', pet: 'moa'
mgr.task name: 'Bob', pet: 'cassowary'
Using the TaskBatcher module
taskname = 'db-insert'
TaskBatcher.set_batch_duration(taskname, 15)
callback = lambda {|retval| print "The return value was #{retval}\n"}
def db_insert(data_list)
sql = "INSERT INTO pet_owners VALUES ("
data_list.each do |data|
sql += "( #{data[:name]}, #{data[:pet]} ), "
end
sql += ")\n"
retval = (execute that SQL)
end
pet_owner_1 = {name: 'Alice', pet: 'moa'}
TaskBatcher.task(taskname, pet_owner_1, callback) do |tasks|
db_insert(tasks)
end
pet_owner_2 = {name: 'Bob', pet: 'cassowary'}
TaskBatcher.task(taskname, pet_owner_2, callback) do |tasks|
db_insert(tasks)
end
... etc. ...
Setting batch durations
TaskBatcher.default_batch_duration
mytask = 'task name 1'
TaskBatcher.set_batch_duration(mytask, 120)
TaskBatcher.batch_duration(mytask)
TaskBatcher.batch_duration('your task')
TaskBatcher.set_default_batch_duration(30)
TaskBatcher.batch_duration('another task')
TaskBatcher.batch_duration('your task')
TaskBatcher.batch_duration(mytask)
Notes
- Batches are grouped by +taskname+. ('db-insert' in the first example.)
- If no batch duration is given, the default batch duration is used. The
default batch duration is initially 60 seconds, but clients can change
the default.
- Batch parameters may be of any type, though hashes seem an obvious choice.
The batched function block must accept a data-list, where a single
data-item constitutes the parameters of a single call within the batch.
- The batched function block can return any data type. If a callback is
provided, it must accept the data type returned by the block. A
callback value of nil indicates that the return value may be discarded.
- TaskBatcher uses Event Machine. Event-driven programming is tricky, and
Event Machine is complex on top of that. Due to fundamental
limitations, TaskBatcher can only guarantee that batches will be
processed after a delay of at least the batch duration.
- Since Ruby's threading has limitations, TaskBatcher gives best performance
if most/all of the client code is event-driven and uses Event Machine.
References: