Security News
GitHub Removes Malicious Pull Requests Targeting Open Source Repositories
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
@overleaf/object-persistor
Advanced tools
Module for storing objects in multiple backends, with fallback on 404 to assist migration between them
Stores arbitrary objects in multiple backends, with support for falling back to a secondary backend if the object can't be found in the primary.
Contains a workaround within the GCS backend to allow lifecycle rules to keep objects for a set period of time from deletion, which can't currently be accomplished with GCS's own rules. (See configuration-specific notes later)
// import the module
const ObjectPersistor = require('object-persistor')
const config = {
// see 'Configuration' section below
}
// create a new persistor
const Persistor = ObjectPersistor(config)
Errors returned by persistor methods are all derived from OError
(@overleaf/o-error
.) To perform instanceof
checks, you can use the Errors
object from the persistor module:
const ObjectPersistor = require('object-persistor')
const { Errors } = ObjectPersistor
async function sendStream(bucketName, key, readStream, opts = {})
Uploads a stream to the backend.
bucketName
: The name of the bucket to upload tokey
: The key for the uploaded objectreadStream
: The data stream to uploadopts
(optional):
sourceMd5
: The md5 hash of the source data, if known. The uploaded data will be compared against this and the operation will fail if it does not match. If omitted, the md5 is calculated as the data is uploaded instead, and verified against the backend.contentType
: The content type to write in the object metadatacontentEncoding
: The content encoding to write in the object metadataWhen using a secondary persistor, this method uploads only to the primary.
If an object already exists at the specified key, it will be overwritten.
async function getObjectStream(bucketName, key, opts = {})
Retrieves a stream from the backend, for reading
bucketName
: The name of the bucket to download fromkey
: The key for the objectopts
(optional):
start
, end
: Downloads a byte range from the object. Specify both start
and end
. end
is inclusive.A stream.Readable
to read the data.
When using a secondary persistor, this method will fall back to retrieving the object from the secondary if it does not exist on the primary.
async function getRedirectUrl(bucketName, key)
Gets a signed link directly to the backend, if possible. This can be used to download the data directly, instead of proxying it.
bucketName
: The name of the bucket to download fromkey
: The key for the objectA string
containing the signed link, or null
if a link cannot be generated.
In the case of null
, you should fall back to getObjectStream
as sometimes signed links cannot be generated.
Do not use this method if you are using a secondary persistor, as this mechanism does not check to see if the object actually exists - so cannot provide a fallback.
async function getObjectSize(bucketName, key)
Returns the size of the stored data
bucketName
: The name of the bucket to download fromkey
: The key for the objectAn integer containing the size, in bytes.
When using a secondary persistor this method returns the size from the secondary persistor, if not found on the primary.
async function getObjectMd5Hash(bucketName, key)
Returns the MD5 hash of the stored data
bucketName
: The name of the bucket to download fromkey
: The key for the objectA string containing the hex representation of the MD5 hash
When using a secondary persistor this method returns the hash from the secondary persistor, if not found on the primary.
async function deleteFile(bucketName, key)
Deletes an object
bucketName
: The name of the bucket to delete fromkey
: The key for the objectWhen using a secondary persistor, this deletes the object from both persistors.
async function deleteDirectory(bucketName, key)
Deletes a directory (all object whose keys start with the supplied key
)
bucketName
: The name of the bucket to delete fromkey
: The key prefix for the objectsWhen using a secondary persistor, this deletes the objects from both persistors.
async function directorySize(bucketName, key)
Returns the size of a directory (all objects whose keys start with the supplied key
)
bucketName
: The name of the bucket to examinekey
: The key prefix for the objectsAn integer containing the size, in bytes
When using a secondary persistor, this returns the value from the secondary persistor if no objects are found on the primary.
async function checkIfObjectExists(bucketName, key)
Returns whether an object exists
bucketName
: The name of the bucket to examinekey
: The key for the objectA boolean representing whether the object exists
When using a secondary persistor, returns true if the object exists on either the primary or secondary.
async function copyObject(bucketName, sourceKey, destKey)
Copies a object to another key, within a bucket.
bucketName
: The name of the bucket in which to copy the objectsourceKey
: The key for the object to be copieddestKey
: The key to which the object should be copiedCan only copy objects within a single bucket. To copy objects in any other way, pass the stream returned from getObjectStream
to sendStream
If an object already exists at the specified key, it will be overwritten.
async function sendFile(bucketName, key, fsPath)
Uploads a file from the local disk.
bucketName
: The name of the bucket to upload tokey
: The key for the uploaded objectfsPath
: The path on disk to the file for uploadingWhen using a secondary persistor, this method uploads only to the primary.
If an object already exists at the specified key, it will be overwritten.
This method is designed for applications which may write temporary data out to the disk before uploading.
An object with the relevant configuration should be passed to the main function returned from the module. The object contains both common and backend-specific parameters.
backend
(required): String specifying the primary persistor to use as the storage backend. Must be one of s3
, gcs
or fs
.signedUrlExpiryInMs
: Time before expiry (in milliseconds) of signed URLspath.uploadFolder
(required): Location for temporary files that are being uploadedFor the FS
persistor, the bucketName
should be the full path to the folder on disk where the files are stored.
s3.key
(required): The AWS access key ID
s3.secret
(required): The AWS secret access key
s3.partSize
: The part size for S3 uploads. Defaults to 100 megabytes.
s3.httpOptions
: HTTP options passed directly to the S3 constructor.
s3.maxRetries
: The number of times the S3 client will retry in case of an error
s3.endpoint
: For testing - overrides the S3 endpoint to use a different service (e.g. a fake S3 server)
s3.pathStyle
: For testing - use old path-style URLs, for services that do not support subdomain-based access
s3BucketCreds
: A JSON-encoded string specifying different S3 credentials for accessing different buckets, in the following format. These credentials override the default ones configured in the main s3
settings:
{
"bucketName": {
"auth_key": "your aws access key ID",
"auth_secret": "your aws secret access key"
}
}
In order for server-side MD5 generation to work, uploads must be below the partSize
. Otherwise a multipart upload will be used, and the S3 eTag
which is used to retrieve the MD5 will not be the MD5 hash of the uploaded object. In these cases, we download the data and calculate the MD5 manually.
For verification during upload, we use S3's checksum mechanism to verify the integrity of the uploaded data, but when explicitly retrieving the md5 hash this will download the entire object if its size is above the part size.
GCS authentication is configured automatically via the local service account, or the GOOGLE_APPLICATION_CREDENTIALS
environment variable.
gcs.unlockBeforeDelete
: unlock an event-based hold before deleting. default false (see notes)gcs.deletedBucketSuffix
: if present, copy the object to a bucket with this suffix before deletion (see notes)gcs.deleteConcurrency
: when recursively deleting a directory, the maximum number of delete requests that will be used at once (default 50)gcs.unsignedUrls
: For testing - do not sign GCS download URLsgcs.endpoint.apiEndpoint
: For testing - specify a different GCS endpoint to usegcs.endpoint.apiScheme
: For testing - specify a scheme to use for the GCS endpoint (http
or https
)gcs.endpoint.projectId
: For testing - the GCS project ID to supply to the overridden backendIn order to support deletion after a period, the GCS persistor allows usage of a two-bucket system. The main bucket contains the live objects, and on delete the objects are first copied to a 'deleted' bucket, and then deleted from the main one. The 'deleted' bucket is then expected to have a lifecycle policy applied to delete objects after a set period.
In order to prevent accidental deletion from outside this mechanism, an event-based-hold can be applied by default on the main bucket. This will be unlocked after the object has been copied to the 'deleted' bucket so that the object can then be deleted from the main bucket.
Contributions should pass lint, formatting and unit test checks. To run these, use
npm run test
There are no acceptance tests in this module, but https://github.com/overleaf/filestore/ contains a comprehensive set of acceptance tests that use this module. These should also pass, with the changes.
FAQs
Module for storing objects in multiple backends, with fallback on 404 to assist migration between them
The npm package @overleaf/object-persistor receives a total of 5 weekly downloads. As such, @overleaf/object-persistor popularity was classified as not popular.
We found that @overleaf/object-persistor demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.