@cumulus/checksum
Checksum
The @cumulus/checksum
library provides checksum functionality used by Cumulus
packages and tasks. Currently the supported input includes file streams, and
supported checksum algorithms include cksum
and the algorithms available to
the crypto
package, as documented
here.
Usage
const fs = require('fs');
const { generateChecksumFromStream } = require('@cumulus/checksum');
const stream = fs.createReadStream('myDataFile.hdf');
const myCksum = generateChecksumFromStream('cksum', stream);
API
checksum
checksum.generateChecksumFromStream(algorithm, stream, [options]) ⇒ Promise.<(number|string)>
Create file checksum from readable stream
Kind: static method of checksum
Returns: Promise.<(number|string)>
- the file checksum
Param | Type | Description |
---|
algorithm | string | Checksum algorithm type |
stream | stream.Readable | A readable file stream |
[options] | Object | Checksum options, see crypto.createHash() |
checksum.validateChecksumFromStream(algorithm, stream, expectedSum, [options]) ⇒ Promise.<boolean>
Validate expected checksum against calculated checksum
Kind: static method of checksum
Returns: Promise.<boolean>
- whether expectedSum === calculatedSum
Param | Type | Description |
---|
algorithm | string | Checksum algorithm |
stream | stream.Readable | A readable file stream |
expectedSum | number | string | expected checksum |
[options] | Object | Checksum options |
About Cumulus
Cumulus is a cloud-based data ingest, archive, distribution and management
prototype for NASA's future Earth science data streams.
Cumulus Documentation
Generated automatically using npm run build-docs
[v20.0.0] 2025-02-04
Phase 2 Release
Breaking Changes
- CUMULUS-3934
- Removed
ecs_cluster_instance_allow_ssh
resource. - The
ecs_cluster_instance_allow_ssh
was implemented before SSM hosts were deployed
to NGAP accounts and allowed for SSHing into an instance from an SSH bastion, which no longer exists. - Tunneling into an EC2 via SSM is still supported. Users relying solely on SSH will need to transition to SSM.
- CUMULUS-2564
- Updated
sync-granule
task to add useGranIdPath
as a configuration flag.
This modifies the task behavior to stage granules to
<staging_path>/<collection_id>/<md5_granuleIdHash>
to allow for better S3
partitioning/performance for large collections.
Because of this benefit
the default has been set to true
, however as sync-granules relies on
object name collision, this configuration changes the duplicate collision
behavior of sync-granules to be per-granule-id instead of per-collection
when active.
If the prior behavior is desired, please add "useGranIdPath": false
to your
task config in your workflow definitions that use sync-granule
.
- CUMULUS-3698
- GranuleSearch retrieving files/execution is toggled
by setting "includeFullRecord" field to 'true' in relevant api endpoint params
- GranuleSearch does not retrieve files/execution by default unless includeFullRecord is set to 'true'
- @cumulus/db function getExecutionArnByGranuleCumulusId is removed. To replace this function use getExecutionInfoByGranuleCumulusId with parameter executionColumns set to ['arn'] or unset (['arn'] is the default argument)
Migration Notes
CUMULUS-3833 Migration of ReconciliationReports from DynamoDB to Postgres after Cumulus is upgraded.
To invoke the Lambda and start the ReconciliationReport migration, you can use the AWS Console or CLI:
aws lambda invoke --function-name $PREFIX-ReconciliationReportMigration $OUTFILE
PREFIX
is your Cumulus deployment prefix.OUTFILE
(optional) is the filepath where the Lambda output will be saved.
CUMULUS-3967
External tooling making use of searchContext
in the GET
/granules/
endpoint will need to update to make use of standard pagination via limit
and page
scrolling, as searchContext
is no longer supported/is an ES specific feature.
Replace ElasticSearch Phase 2
- CUMULUS-3967
- Remove
searchContext
from API granules GET /granules
endpoint. - Update relevant tests to validate expected behavior utilizing postgres pagination
- CUMULUS-3229
- Remove ElasticSearch queries from Rule LIST endpoint
- CUMULUS-3230
- Remove ElasticSearch dependency from Rule Endpoints
- CUMULUS-3231
- Updated API
pdrs
LIST
endpoint to query postgres
- CUMULUS-3232
- Update API PDR endpoints
DEL
and GET
to not update Elasticsearch
- CUMULUS-3233
- Updated
providers
list api endpoint and added ProviderSearch
class to query postgres - Removed Elasticsearch dependency from
providers
endpoints
- CUMULUS-3235
- Updated
asyncOperations
api endpoint to query postgres
- CUMULUS-3236
- Update API AsyncOperation endpoints
POST
and DEL
to not update
Elasticsearch - Update
@cumlus/api/ecs/async-operation
to not update Elasticsearch index when
reporting status of async operation
- CUMULUS-3698
- GranuleSearch now can retrieve associated files for granules
- GranuleSearch now can retrieve latest associated execution for granules
- CUMULUS-3806
- Update
@cumulus/db/search
to allow for ordered collation as a
dbQueryParameter - Update
@cumulus/db/search
to allow dbQueryParameters.limit
to be set to
null
to allow for optional unlimited page sizes in search results - Update/add type annotations/logic fixes to
@cumulus/api
reconciliation report code - Annotation/typing fixes to
@cumulus/cmr-client
- Typing fixes to
@cumulus/db
- Re-enable Reconciliation Report integration tests
- Update
@cumulus/client/CMR.getToken
to throw if a non-launchpad token is requested without a username - Update
Inventory
and Granule Not Found
reports to query postgreSQL
database instead of elasticsearch - Update
@cumulus/db/lib/granule.getGranulesByApiPropertiesQuery
to
allow order by collation to be optionally specified - Update
@cumulus/db/lib/granule.getGranulesByApiPropertiesQuery
to
be parameterized and include a modifier on temporalBoundByCreatedAt
- Remove endpoint call to and all tests for Internal Reconciliation Reports
and updated API to throw an error if report is requested
- Update Orca reconciliation reports to pull granules for comparison from
postgres via
getGranulesByApiPropertiesQuery
- CUMULUS-3837
- Added
reconciliation_reports
table in RDS, including indexes - Created pg model, types, and translation for
reconciliationReports
in @cumulus/db
- CUMULUS-3833
- Created api types for
reconciliation_reports
in @cumulus/types/api
- Updated reconciliation reports lambda to write to new RDS table instead of Dynamo
- Updated
@cumulus/api/endpoints/reconciliation-reports
getReport
and deleteReport
to work with the new RDS table instead of Dynamo
- CUMULUS-3718
- Updated
reconciliation_reports
list api endpoint and added ReconciliationReportSearch
class to query postgres - Added
reconciliationReports
type to stats endpoint, so aggregate
query will work for reconciliation reports
- CUMULUS-3859
- Updated
@cumulus/api/bin/serveUtils
to no longer add records to ElasticSearch - Removed ElasticSearch from local API server code
- Updated CollectionSearch to filter granule fields in addition to time frame for active collections
- CUMULUS-3847
- remove remaining ES indexing in code and tests
- for asyncOperations test data, change any ES related values to other options
- remove code from
@cumulus/api/lambdas/cleanExecutions
leaving a dummy handler, as the code worked with ES. lambda will be rewritten with CUMULUS-3982 - remove
@cumulus/api/endpoints/elasticsearch
, @cumulus/api/lambdas/bootstrap
, and @cumulus/api/lambdas/index-from-database
- CUMULUS-3983
- Removed elasticsearch references used in in cumulus
tf-modules
Added
- CUMULUS-3757
- Added a
/granules
endpoint PATCH/bulkPatchGranuleCollection
which updates a batch of granule records collectionId to a new collectionId. This endpoint takes a list of granules and a collectionId, updating the granules' to the collectionId passed with the payload in postgres. - Added a
/granules
endpoint PATCH/bulkPatch
which applies PATCH to a list of granules. For its payload, this endpoint takes a list of granules (the updates to be made to the granule, similar to the pre-existing PATCH
), a dbConcurrency
and dbMaxPool
variables for configuring concurrency and database thoroughput for postgres to tailor to performance and database needs.
- CUMULUS-3919
- Added terraform variables
disableSSL
and rejectUnauthorized
to tf-modules/cumulus-rds-tf
module.
- CUMULUS-3959
- Added documentation to help DAACs troubleshoot database migration issues.
- CUMULUS-3978
- Added
iops
and throughput
options to elasticsearch_config
variable
in tf-modules/data-persistence
; These two options are necessary for gp3 EBS volume type.
Changed
- CUMULUS-3947
- Bump @cumulus/cumulus-message-adapter-js to version 2.3.0. This will explicitly put the Python cumulus-message-adapter spawn into UTF-8 mode. See https://github.com/nasa/cumulus-message-adapter-js/releases/tag/v2.3.0
- CUMULUS-3967
- Pinned @aws-sdk/client-s3 in @cumulus/aws-client to 3.726.0 to address breaking changes/incompatibility in releases > 3.726.0
- Pinned @aws-sdk/client-s3 in @cumulus/lib-storage to 3.726.0 to address breaking changes/incompatibility in releases > 3.726.0
- CUMULUS-3940
- Added 'dead_letter_recovery_cpu' and 'dead_letter_recovery_memory' to
cumulus
and archive
module configuration to allow configuration of the dead_letter_recovery_operation task definition to better allow configuration of the tool's operating environment. - Updated the dead letter recovery tool to utilize it's own log group "${var.prefix}-DeadLetterRecoveryEcsLogs"
- Added
batchSize
, concurrency
and dbMaxPool
options to /endpoints/recoverCumulusMessage (note these values are correct at time of this release only):
batchSize
- specifies how many DLA objects to read from S3 and hold in memory. Defaults to 1000.concurrency
- specifies how many messages to process at the same time. Defaults to 30.dbMaxPool
- specifies how many database connections to allow the process to utilize. Defaults to 30. Process should at minimum the value set for concurrency
.
- Add API memory-constrained performance test to test minimum functionality under default+ configuration
- Updated
@cumulus/async-operations.startAsyncOperation to take
containerName` as a parameter name, allowing it to specify a container other than the default 'AsyncOperations' container
- CUMULUS-3759
- Migrated
tf-modules/cumulus/ecs_cluster
ECS Autoscaling group from launch configurations to launch templates
- CUMULUS-3955
- Removed
VACUUM
statements from db migrations. In cases where the PG database is very large, these queries
can take a long time and exceed the Lambda timeout, causing failures on deployment.
- CUMULUS-3931
- Add
force_new_deployment
to cumulus_ecs_service
to allow users to force
new task deployment on terraform redeploy. See docs for more details:
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_service#force_new_deployment"
- CUMULUS-3941
- Updated
SendPan
task to generate short pan with FAILED disposition.
- CUMULUS-3936,CUMULUS-3948
- Updated
tf-modules/cumulus/ecs_cluster_instance_autoscaling_cf_template.yml.tmpl
user-data for compatibility with Amazon Linux 2023 AMI - Fixed
tf-modules/cumulus
scripts to use Instance Metadata Service V2 - Updated
fake-provider-cf.yml
to work for Amazon Linux 2023 AMI
- CUMULUS-3960
- Updated
PostToCmr
task to be able to republish
granules
- CUMULUS-3965
- Updated
tf-modules/cumulus/ecs_cluster
and fake-provider-cf.yml
launch templates to require IMDSv2
- CUMULUS-3990
- Upgraded localstack from 3.0.0 to 4.0.3
Fixed
- CUMULUS-3933
- Update example/bamboo/integration-tests.sh to properly exit if lock-stack
errors/detects another stack lock
- CUMULUS-3876
- Fixed
s3-replicator
lambda cross region write failure - Added
target_region
variable to tf-modules/s3-replicator
module
- CUMULUS-3981
- Added required $metadata field when creating new instance of ServiceException.
- Security Vulnerabilities
- Updated
@octokit/graphql
from 2.1.1 to ^2.3.0 to address [CVE-2024-21538]
(https://github.com/advisories/GHSA-3xgq-45jj-v275)