s3-upload-cleaner - abort stale S3 multipart uploads
Synopsis
var s3UploadCleaner = require('s3-upload-cleaner');
var AWS = require('aws-sdk');
var config = {
bucket_location_match: ".*",
bucket_name_match: ".*",
key_match: ".*",
dry_run: false,
threshold_date: new Date(new Date() - 1000 * 60 * 60 * 24 * 7),
};
var s3Client = new AWS.S3({ region: 'eu-west-1' });
var ispyContext = new s3UploadCleaner.IspyContextLite();
var cleaner = new s3UploadCleaner.AccountCleaner(s3Client, config, ispyContext);
cleaner.run().done();
The problem
To upload data to AWS S3, several methods are possible, one of which is
Multipart Uploads: https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
To perform a Multipart Upload, the client initiates the upload, then adds
one or more parts, then completes the upload. Or, an initiated upload can
be aborted. Completing the upload takes the parts and uses them to create
the desired object in S3; aborting the upload discards the parts.
The problem is that it's very easy to forget (e.g. due to a crash) to ever
complete or abort a multipart upload; and there is no timeout. That is,
incomplete multipart uploads remain incomplete forever, until complete or
abort is called.
Storage for data uploaded to a not-yet-complete multipart upload is billable.
The solution
The S3 Upload Cleaner finds incomplete multipart uploads in each of your S3
buckets, and aborts any which are "stale" - that is, those which were started
a long time ago. (In example/minimal.js, the threshold for this is 1 week).
Therefore, periodically running the S3 Upload Cleaner on your account can save
you money.
In general, any errors encountered will cause the cleaner to abort. For
example, if an AWS permissions error is encountered, then some multipart
uploads might not be cleaned.
Configuration
The config object (see above) recognises a number of keys:
-
bucket_name_match
: a regular expression string; only buckets whose names
match this will be included.
-
bucket_location_match
: a regular expression string; only buckets whose
locations (aka region) match this will be searched.
-
key_match
: a regular expression string; only multipart uploads going to keys
that match this will be included.
-
threshold_date
: a Date object; multipart uploads initiated after this
date will be ignored.
-
dry_run
: boolean; if true, then don't actually abort the uploads (but
do find the upload and emit the relevant iSpy events).
iSpy events
The cleaner emits "iSpy events" (an event is basically a String/String map)
for each multipart upload. In example/minimal.js, these are logged to the
console in json form. Supply your own "sink" to IspyContextLite to change
this behaviour; see ispy-context-lite.js.
IspyContextLite is a (hopefully temporary) placeholder for the full
IspyContext module, which should hopefully be open sourced soon.
Bugs
Unit testing is incomplete.
Proxy and other settings should be (but aren't) propagated from the initial S3
client, to the client created for each S3 location.
Author
Rachel Evans (http://rve.org.uk/)