Lightship 🚢
(Please read Best practices section.)
Abstracts readiness, liveness and startup checks and graceful shutdown of Node.js services running in Kubernetes.
Behaviour
Creates a HTTP service used to check container probes.
Refer to the following Kubernetes documentation for information about the readiness and liveness checks:
Local-mode
If Lightship detects that it is running in a non-Kubernetes environment (e.g. your local machine) then it starts the HTTP service on any available HTTP port. This is done to avoid port collision when multiple services using Lightship are being developed on the same machine. This behaviour can be changed using detectKubernetes
and port
configuration.
/health
/health
endpoint describes the current state of a Node.js service.
The endpoint responds:
200
status code, message "SERVER_IS_READY" when server is accepting new connections.500
status code, message "SERVER_IS_NOT_READY" when server is initialising.500
status code, message "SERVER_IS_SHUTTING_DOWN" when server is shutting down.
Used for human inspection.
/live
The endpoint responds:
200
status code, message "SERVER_IS_NOT_SHUTTING_DOWN".500
status code, message "SERVER_IS_SHUTTING_DOWN".
Used to configure liveness probe.
/ready
The endpoint responds:
200
status code, message "SERVER_IS_READY".500
status code, message "SERVER_IS_NOT_READY".
Used to configure readiness probe.
Timeouts
Lightship has two timeout configurations: gracefulShutdownTimeout
and shutdownHandlerTimeout
.
gracefulShutdownTimeout
(default: 60 seconds) is a number of milliseconds Lightship waits for Node.js process to exit gracefully after it receives a shutdown signal (either via process
or by calling lightship.shutdown()
) before killing the process using process.exit(1)
. This timeout should be sufficiently big to allow Node.js process to complete tasks (if any) that are active at the time that the shutdown signal is received (e.g. complete serving responses to all HTTP requests) (Note: You must explicitly inform Lightship about active tasks using beacons).
shutdownHandlerTimeout
(default: 5 seconds) is a number of milliseconds Lightship waits for shutdown handlers (see registerShutdownHandler
) to complete before killing the process using process.exit(1)
.
If after all beacons are dead and all shutdown handlers are resolved Node.js process does not exit gracefully, then Lightship will force terminate the process with an error. Refer to How to detect what is holding the Node.js process alive?.
Usage
Use createLightship
to create an instance of Lightship.
import {
createLightship
} from 'lightship';
const configuration: ConfigurationInputType = {};
const lightship: LightshipType = createLightship(configuration);
The following types describe the configuration shape and the resulting Lightship instance interface.
type ShutdownHandlerType = () => Promise<void> | void;
export type ConfigurationInputType = {|
+detectKubernetes?: boolean,
+gracefulShutdownTimeout?: number,
+port?: number,
+shutdownDelay?: number,
+shutdownHandlerTimeout?: number,
+signals?: $ReadOnlyArray<string>,
+terminate?: () => void,
|};
type LightshipType = {|
+createBeacon: (context?: BeaconContextType) => BeaconControllerType,
+isServerReady: () => boolean,
+isServerShuttingDown: () => boolean,
+queueBlockingTask: (blockingTask: Promise<any>) => void,
+registerShutdownHandler: (shutdownHandler: ShutdownHandlerType) => void,
+server: http$Server,
+shutdown: () => Promise<void>,
+signalNotReady: () => void,
+signalReady: () => void,
+whenFirstReady: () => Promise<void>,
|};
Kubernetes container probe configuration
This is an example of a reasonable container probe configuration to use with Lightship.
readinessProbe:
httpGet:
path: /ready
port: 9000
failureThreshold: 1
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /live
port: 9000
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
startupProbe:
httpGet:
path: /live
port: 9000
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
Logging
lightship
is using Roarr to implement logging.
Set ROARR_LOG=true
environment variable to enable logging.
Queueing service blocking tasks
Your service may not be ready until some asynchronous operation is complete, e.g. waiting for webpack-dev-middleware#waitUntilValid
. In this case, use queueBlockingTask
to queue blocking tasks. This way, Lightship status will be set to SERVER_IS_NOT_READY
until all blocking tasks are resolved (and signalReady
has been called).
import express from 'express';
import {
createLightship
} from 'lightship';
const lightship = createLightship();
lightship.queueBlockingTask(new Promise((resolve) => {
setTimeout(() => {
resolve();
}, 1000);
}));
const app = express();
app.get('/', (req, res) => {
res.send('Hello, World!');
});
const server = app.listen(8080, () => {
lightship.signalReady();
});
Waiting for the server to become ready
whenFirstReady
can be used to wait until the first time the service becomes ready.
The promise returned by whenFirstReady
is resolved only once. Use this function to delay execution of tasks that depend on the server to be ready.
import express from 'express';
import {
createLightship
} from 'lightship';
const lightship = createLightship();
const app = express();
app.get('/', (req, res) => {
res.send('Hello, World!');
});
const server = app.listen(8080, () => {
lightship.signalReady();
});
(async () => {
await lightship.whenFirstReady();
await runIntegrationTests();
})();
Usage examples
Using with Express.js
Suppose that you have Express.js application that simply respond "Hello, World!".
import express from 'express';
const app = express();
app.get('/', (req, res) => {
res.send('Hello, World!');
});
app.listen(8080);
To create a liveness and readiness check, simply create an instance of Lightship and use registerShutdownHandler
to register a server shutdown handler, e.g.
import express from 'express';
import {
createLightship
} from 'lightship';
const app = express();
app.get('/', (req, res) => {
res.send('Hello, World!');
});
const server = app
.listen(8080, () => {
lightship.signalReady();
})
.on('error', () => {
lightship.shutdown();
});;
const lightship = createLightship();
lightship.registerShutdownHandler(() => {
server.close();
});
Suppose that a requirement has been added that you need to ensure that you do not say "Hello, World!" more often than 100 times per minute.
Use signalNotReady
method to change server state to "SERVER_IS_NOT_READY" and use signalReady
to revert the server state to "SERVER_IS_READY".
import express from 'express';
import {
createLightship
} from 'lightship';
const app = express();
const minute = 60 * 1000;
let runningTotal = 0;
app.get('/', (req, res) => {
runningTotal++;
setTimeout(() => {
runningTotal--;
if (runningTotal < 100) {
lightship.signalReady();
} else {
lightship.signalNotReady();
}
}, minute);
res.send('Hello, World!');
});
const server = app.listen(8080);
const lightship = createLightship();
lightship.registerShutdownHandler(() => {
server.close();
});
lightship.signalReady();
How quick Kubernetes observes that the server state has changed depends on the probe configuration, specifically periodSeconds
, successThreshold
and failureThreshold
, i.e. expect requests to continue coming through for a while after the server state has changed.
Suppose that a requirement has been added that the server must shutdown after saying "Hello, World!" 1000 times.
Use shutdown
method to change server state to "SERVER_IS_SHUTTING_DOWN", e.g.
import express from 'express';
import delay from 'delay';
import {
createLightship
} from 'lightship';
const app = express();
const minute = 60 * 1000;
let total = 0;
let runningTotal = 0;
app.get('/', (req, res) => {
total++;
runningTotal++;
if (total === 1000) {
lightship.shutdown();
}
setTimeout(() => {
runningTotal--;
if (runningTotal < 100) {
lightship.signalReady();
} else {
lightship.signalNotReady();
}
}, minute);
res.send('Hello, World!');
});
const server = app.listen(8080);
const lightship = createLightship();
lightship.registerShutdownHandler(async () => {
await delay(minute);
server.close();
});
lightship.signalReady();
Do not call process.exit()
in a shutdown handler – Lighthouse calls process.exit()
after all registered shutdown handlers have run to completion.
If for whatever reason a registered shutdown handler hangs, then (subject to the Pod's restart policy) Kubernetes will forcefully restart the Container after the livenessProbe
deems the service to be failed.
Beacons
Beacons are used to delay the registered shutdown handler routine.
A beacon can be created using createBeacon()
method, e.g.
const lightship = createLightship();
const beacon = lightship.createBeacon();
Beacon is live upon creation. Shutdown handlers are suspended until there are no live beacons.
To signal that a beacon is dead, use die()
method:
beacon.die();
After beacon has been killed, it cannot be revived again.
Use beacons to suspend the registered shutdown handler routine when you are processing a job queue, e.g.
for (const job of jobs) {
if (lightship.isServerShuttingDown()) {
log.info('detected that the service is shutting down; terminating the event loop');
break;
}
const beacon = lightship.createBeacon();
await beacon.die();
}
Additionally, you can provide beacons with context, e.g.
for (const job of jobs) {
if (lightship.isServerShuttingDown()) {
log.info('detected that the service is shutting down; terminating the event loop');
break;
}
const beacon = lightship.createBeacon({
jobId: job.id
});
await beacon.die();
}
The logs will include messages describing the beacons that are holding the connection, e.g.
{"context":{"package":"lightship","namespace":"factories/createLightship","logLevel":30,"beacons":[{"context":{"id":1}}]},"message":"program termination is on hold because there are live beacons","sequence":2,"time":1563892493825,"version":"1.0.0"}
Best practices
Add a delay before stop handling incoming requests
It is important that you do not cease to handle new incoming requests immediatelly after receiving the shutdown signal. This is because there is a high probability of the SIGTERM signal being sent well before the iptables rules are updated on all nodes. The result is that the pod may still receive client requests after it has received the termination signal. If the app stops accepting connections immediately, it causes clients to receive "connection refused" types of errors.
Properly shutting down an application includes these steps:
- Wait for a few seconds, then stop accepting new connections,
- Close all keep-alive connections that aren't in the middle of a request,
- Wait for all active requests to finish, and then
- Shut down completely.
See Handling Client Requests Properly with Kubernetes for more information.
FAQ
What is the reason that my liveness/ readiness endpoints are intermittently failing?
You may discover that your service health checks are failing intermittently, e.g.
Warning Unhealthy 4m17s (x3 over 4m27s) kubelet, f95a4d94-jwfr Liveness probe failed: Get http://10.24.7.155:9000/live: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 3m28s (x15 over 4m38s) kubelet, f95a4d94-jwfr Readiness probe failed: Get http://10.24.7.155:9000/ready: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
This may happen if you are perfoming event-loop blocking tasks for extended durations, e.g.
const startTime = Date.now();
let index0 = 1000;
while (index0--) {
let index1 = 1000;
while (index1--) {
console.log(index0 + ':' + index1);
}
}
console.log(Date.now() - startTime);
If executed, the above operation would block the event-loop for couple of seconds (e.g. 8 seconds on my machine). During this time Lightship is going to be unresponsive.
Your options are:
What is the reason for having separate /live
and /ready
endpoints?
Distinct endpoints are needed if you want your Container to be able to take itself down for maintenance (as done in the Using with Express.js usage example). Otherwise, you can use /health
.
How to detect what is holding the Node.js process alive?
You may get a log message saying that your process did not exit on its own, e.g.
[2019-11-10T21:11:45.452Z] DEBUG (20) (@lightship) (#factories/createLightship): all shutdown handlers have run to completion; proceeding to terminate the Node.js process
[2019-11-10T21:11:46.455Z] WARN (40) (@lightship) (#factories/createLightship): process did not exit on its own; investigate what is keeping the event loop active
This means that there is some work that is scheduled to happen (e.g. a referenced setTimeout
).
In order to understand what is keeping your Node.js process from exiting on its own, you need to identify all active handles and requests. This can be done with a help of utilities such as wtfnode
and why-is-node-running
, e.g.
import whyIsNodeRunning from 'why-is-node-running';
import express from 'express';
import {
createLightship
} from 'lightship';
const app = express();
app.get('/', (req, res) => {
res.send('Hello, World!');
});
const server = app.listen(8080);
const lightship = createLightship();
lightship.registerShutdownHandler(() => {
server.close();
whyIsNodeRunning();
});
lightship.signalReady();
In the above example, calling whyIsNodeRunning
will print a list of all active handles that are keeping the process alive.
Related projects
- Iapetus – Prometheus metrics server.
- Preoom – Retrieves & observes Kubernetes Pod resource (CPU, memory) utilisation.