Amazon Kinesis Client Library for Node.js
This package provides an interface to the Amazon Kinesis Client Library (KCL) MultiLangDaemon for the Node.js framework.
Developers can use the KCL to build distributed applications that process streaming data reliably at scale. The KCL takes care of many of the complex tasks associated with distributed computing, such as load-balancing across multiple instances, responding to instance failures, checkpointing processed records, and reacting to changes in stream volume.
This package wraps and manages the interaction with the MultiLangDaemon, which is provided as part of the Amazon KCL for Java so that developers can focus on implementing their record processing logic.
A record processor in Node.js typically looks like the following:
var kcl = require('aws-kcl');
var util = require('util');
var recordProcessor = {
initialize: function(initializeInput, completeCallback) {
completeCallback();
},
processRecords: function(processRecordsInput, completeCallback) {
if (!processRecordsInput || !processRecordsInput.records) {
completeCallback();
return;
}
var records = processRecordsInput.records;
var record, sequenceNumber, partitionKey, data;
for (var i = 0 ; i < records.length ; ++i) {
record = records[i];
sequenceNumber = record.sequenceNumber;
partitionKey = record.partitionKey;
data = new Buffer(record.data, 'base64').toString();
}
if (!sequenceNumber) {
completeCallback();
return;
}
processRecordsInput.checkpointer.checkpoint(sequenceNumber,
function(err, checkpointedSequenceNumber) {
completeCallback();
}
);
},
shutdown: function(shutdownInput, completeCallback) {
if (shutdownInput.reason !== 'TERMINATE') {
completeCallback();
return;
}
shutdownInput.checkpointer.checkpoint(function(err) {
completeCallback();
});
}
};
kcl(recordProcessor).run();
Before You Get Started
Prerequisite
Before you begin, Node.js and NPM must be installed on your system. For download instructions for your platform, see http://nodejs.org/download/.
To get the sample KCL application and bootstrap script, you need git.
Amazon KCL for Node.js uses MultiLangDaemon provided by Amazon KCL for Java. You also need Java version 1.7 or higher installed.
Setting Up the Environment
Before running the samples, make sure that your environment is configured to allow the samples to use your AWS Security Credentials, which are used by MultiLangDaemon to interact with AWS services.
By default, the MultiLangDaemon uses the DefaultAWSCredentialsProviderChain, so make your credentials available to one of the credentials providers in that provider chain. There are several ways to do this. You can provide credentials through a ~/.aws/credentials
file or through environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY). If you're running on Amazon EC2, you can associate an IAM role with your instance with appropriate access.
For more information about Amazon Kinesis and the client libraries, see the
Amazon Kinesis documentation as well as the Amazon Kinesis forums.
Running the Sample
The Amazon KCL for Node.js repository contains source code for the KCL, a sample data producer and data consumer (processor) application, and the bootstrap script.
To run sample applications, you need to get all required NPM modules. From the root of the repository, execute the following command:
npm install
This downloads all dependencies for running the bootstrap script as well as the sample application.
The sample application consists of two components:
- A data producer (
samples/basic_sample/producer/sample_kinesis_producer_app.js
): this script creates an Amazon Kinesis stream and starts putting 10 random records into it. - A data processor (
samples/basic_sample/consumer/sample_kcl_app.js
): this script is invoked by the MultiLangDaemon, consumes the data from the Amazon Kinesis stream, and stores received data into files (1 file per shard).
The following defaults are used in the sample application:
- Stream name:
kclnodejssample
- Number of shards: 2
- Amazon KCL application name:
kclnodejssample
- Amazon DynamoDB table for Amazon KCL application:
kclnodejssample
Running the Data Producer
To run the data producer, execute the following commands from the root of the repository:
cd samples/basic_sample/producer
node sample_kinesis_producer_app.js
Notes
- The script
samples/basic_sample/producer/sample_kinesis_producer_app.js
takes several parameters that you can use to customize its behavior. To change default parameters, change values in the file samples/basic_sample/producer/config.js
.
Running the Data Processor
To start the data processor, run the following command from the root of the repository:
cd samples/basic_sample/consumer
../../../bin/kcl-bootstrap --java /usr/bin/java -e -p ./sample.properties
Notes
- The Amazon KCL for Node.js uses stdin/stdout to interact with MultiLangDaemon. Do not point your application logs to stdout/stderr. If your logs point to stdout/stderr, log output gets mingled with MultiLangDaemon, which makes it really difficult to find consumer-specific log events. This consumer uses a logging library to redirect all application logs to a file called application.log. Make sure to follow a similar pattern while developing consumer applications with the Amazon KCL for Node.js. For more information about the protocol between the MultiLangDaemon and the Amazon KCL for Node.js, go to MultiLangDaemon.
- The bootstrap script downloads MultiLangDaemon and its dependencies.
- The bootstrap script invokes the MultiLangDaemon, which starts the Node.js consumer application as its child process. By default:
- The file
samples/basic_sample/consumer/sample.properties
controls which Amazon KCL for Node.js application is run. You can specify your own properties file with the -p
or --properties
argument. - The bootstrap script uses
JAVA_HOME
to locate the java binary. To specify your own java home path, use the -j
or --java
argument when invoking the bootstrap script.
- To only print commands on the console to run the KCL application without actually running the KCL application, leave out the
-e
or --execute
argument to the bootstrap script. - You can also add REPOSITORY_ROOT/bin to your PATH so you can access kcl-bootstrap from anywhere.
- To find out all the options you can override when running the bootstrap script, run the following command:
kcl-bootstrap --help
Cleaning Up
This sample application creates an Amazon Kinesis stream, sends data to it, and creates a DynamoDB table to track the KCL application state. This will incur nominal costs to your AWS account, and continue to do so even when the sample app is finished. To stop being charged, delete these resources. Specifically, the sample application creates following AWS resources:
- An Amazon Kinesis stream named
kclnodejssample
- An Amazon DynamoDB table named
kclnodejssample
You can delete these using the AWS Management Console.
Running on Amazon EC2
Log into an Amazon EC2 instance running Amazon Linux, then perform the following steps to prepare your environment for running the sample application. Note the version of Java that ships with Amazon Linux can be found at /usr/bin/java
and should be 1.7 or greater.
sudo yum install nodejs npm --enablerepo=epel
sudo yum install git
git clone https://github.com/awslabs/amazon-kinesis-client-nodejs.git kclnodejs
cd kclnodejs/samples/basic_sample/producer/
npm install
node sample_kinesis_producer_app.js &
export PATH=$PATH:kclnodejs/bin
cd kclnodejs/samples/basic_sample/consumer/
kcl-bootstrap --java /usr/bin/java -e -p ./sample.properties > consumer.out 2>&1 &
NPM module
To get the Amazon KCL for Node.js module from NPM, use the following command:
npm install aws-kcl
Under the Hood: Supplemental information about the MultiLangDaemon
Amazon KCL for Node.js uses Amazon KCL for Java internally. We have implemented a Java-based daemon, called the MultiLangDaemon that does all the heavy lifting. The daemon launches the user-defined record processor script/program as a sub-process, and then communicates with this sub-process over standard input/output using a simple protocol. This allows support for any language. This approach enables the Amazon KCL to be language-agnostic, while providing identical features and similar parallel processing model across all languages.
At runtime, there will always be a one-to-one correspondence between a record processor, a child process, and an Amazon Kinesis shard. The MultiLangDaemon ensures that, without any developer intervention.
In this release, we have abstracted these implementation details away and exposed an interface that enables you to focus on writing record processing logic in Node.js.
See Also
Release Notes
Release 0.6.0 (December 12, 2016)
Release 0.5.0 (March 26, 2015)
- The
aws-kcl
npm module allows implementation of record processors in Node.js using the Amazon KCL MultiLangDaemon. - The
samples
directory contains a sample producer and processing applications using the Amazon KCL for Node.js.