laconia-batch
🛡️ Laconia Batch — Reads large number of records without time limit.
Reads large number of records without Lambda time limit.
AWS Lambda maximum execution duration per request is 300 seconds, hence it is
impossible to utilise a Lambda to execute a long running task. laconia-batch
handles your batch processing needs by providing a beautifully designed API
which abstracts the time limitaton problem.
FAQ
Check out FAQ
Usage
Install laconia-batch using yarn:
yarn add laconia-batch
Or via npm:
npm install --save laconia-batch
These are the currently supported input sources:
Example of batch processing by scanning a dynamodb table:
const laconiaBatch = require("laconia-batch");
module.exports.handler = laconiaBatch(
_ =>
laconiaBatch.dynamoDb({
operation: "SCAN",
dynamoDbParams: { TableName: "Music" }
}),
{ itemsPerSecond: 2 }
).on("item", ({ event }, item) => processItem(event, context));
Rate limiting is supported out of the box by setting the batchOptions.itemsPerSecond
option.
How it works
laconia-batch
works around the Lambda's time limitation by using recursion.
It will automatically recurse when Lambda timeout is about to happen, then resumes
from where it left off in the new invocation.
Imagine if you are about to process the array [1, 2, 3, 4, 5] and each requests can only
handle two items, the following will happen:
- request 1: Process 1
- request 1: Process 2
- request 1: Not enough time, recursing with current cursor
- request 2: Process 3
- request 2: Process 4
- request 2: Not enough time, recursing with current cursor
- request 3: Process 5
API
laconiaBatch(readerFn, batchOptions)
readerFn(laconiaContext)
- This
Function
is called when your Lambda is invoked - The function must return a reader object i.e.
dynamoDb()
, s3()
- Will be called with
laconiaContext
object, which can be destructured to {event, context}
batchOptions
itemsPerSecond
- Optional
- Rate limit will not be applied if value is not set
- Can be set to decimal, i.e. 0.5 will equate to 1 item per 2 second.
timeNeededToRecurseInMillis
- Optional
- The value set here will be used to check if the current execution is to be stopped
- If you have a very slow item processing, the batch processor might not have enough time
to recurse and your Lambda execution might be timing out. You can increase this value to
increase the chance of the the recursion to happen
Example:
laconiaBatch(_ => dynamoDb());
laconiaBatch(_ => dynamoDb(), {
itemsPerSecond: 2,
timeNeededToRecurseInMillis: 10000
});
Events
There are events that you can listen to when laconia-batch
is working.
- item:
laconiaContext, item
- Fired on every item read.
item
is an object found during the readlaconiaContext
can be destructured to {event, context}
- start:
laconiaContext
- Fired when the batch process is started for the very first time
laconiaContext
can be destructured to {event, context}
- stop:
laconiaContext, cursor
- Fired when the current execution is timing out and about to be recursed
cursor
contains the information of how the last item is being readlaconiaContext
can be destructured to {event, context}
- end:
laconiaContext
- Fired when the batch processor can no longer find any more records
laconiaContext
can be destructured to {event, context}
Example:
laconiaBatch({ ... })
.on('start', (laconiaContext) => ... )
.on('item', (laconiaContext, item) => ... )
.on('stop', (laconiaContext, cursor) => ... )
.on('end', (laconiaContext) => ... )
dynamoDb(readerOptions)
Creates a reader for Dynamo DB table.
operation
- Mandatory
- Valid values are:
'SCAN'
and 'QUERY'
dynamoDbParams
- Mandatory
- This parameter is used when documentClent's operations are called
ExclusiveStartKey
param can't be used as it will be overridden in the processing time!
documentClient = new AWS.DynamoDB.DocumentClient()
- Optional
- Set this option if there's a need to cutomise the AWS.DynamoDB.DocumentClient instantation
- Used for DynamoDB operation
Example:
dynamoDb({
operation: "SCAN",
dynamoDbParams: { TableName: "Music" }
});
dynamoDb({
operation: "QUERY",
dynamoDbParams: {
TableName: "Music",
Limit: 1,
ExpressionAttributeValues: {
":a": "Bar"
},
FilterExpression: "Artist = :a"
}
});
s3(readerOptions)
Creates a reader for an array stored in s3.
path
- Mandatory
- The path to the array to be processed
- Set to
'.'
if the object stored in s3 is the array - Set to a path if an object is stored in s3 and the array is a property of the object
lodash.get
is used to retrieve the array
s3Params
- Mandatory
- This parameter is used when
s3.getObject
is called to retrieve the array stored in s3
s3 = new AWS.S3()
- Optional
- Set this option if there's a need to cutomise the AWS.S3 instantation
- Used for S3 operation
Example:
s3({
path: ".",
s3Params: {
Bucket: "MyBucket",
Key: "array.json"
}
});
s3({
path: 'database.music[0]["category"].list',
s3Params: {
Bucket: "MyBucket",
Key: "object.json"
}
});