@cumulus/checksum
Advanced tools
Changelog
[v9.9.0] 2021-11-03
MoveGranules
step
isISOFile
to check if a given file object is an ISO filegranuleToCmrFileObject
and granulesToCmrFileObjects
now take a
filterFunc
argument
filterFunc
's default value is isCMRFile
, so the previous behavior is
maintained if no value is given for this argumentMoveGranules
passes a custom filter function to
granulesToCmrFileObjects
to check for isISOFile
in addition to
isCMRFile
, so that metadata from .iso.xml
files can be used in the
urlPathTemplate
elasticsearch_client_config
tfvars to the archive and cumulus terraform modules.default_s3_multipart_chunksize_mb
setting to the move-granules
lambda function.default_s3_multipart_chunksize_mb
tfvars to the cumulus and ingest terraform modules.chunkSize
to @cumulus/aws-client/S3.moveObject
and
@cumulus/aws-client/S3.multipartCopyObject
to set the chunk size of the S3 multipart uploads.maxChunkSize
to chunkSize
in
@cumulus/aws-client/lib/S3MultipartUploads.createMultipartChunks
.@cumulus/cumulus-message-adapter-js
version 2.0.1
@cumulus/api/launchpadSaml.launchpadPublicCertificate
to correctly retrieve
certificate from launchpad IdP metadata with and without namespace prefix.Changelog
[v9.8.0] 2021-10-19
36
of cumuluss/async-operation
to Docker Hub for compatibility with
upgrades to knex
package and to address security vulnerabilities.Added @cumulus/db/createRejectableTransaction()
to handle creating a Knex transaction that will throw an error if the transaction rolls back. As of Knex 0.95+, promise rejection on transaction rollback is no longer the default behavior.
CUMULUS-2639
CUMULUS-2670
lambda_timeouts
string map variable for cumulus
module to accept a
update_granules_cmr_metadata_file_links_task_timeout
propertyCUMULUS-2598
knex
version from 0.23.11 to 0.95.11 to address security vulnerabilitiescumuluss/async-operation:36
queued
when scheduling the granule.buckets.json
out of the
s3://internal-bucket/workflows
directory into
s3://internal-bucket/buckets
.Changelog
[v9.7.0] 2021-10-01
queue-granules
task now updates granule status to queued
when a granule is queued. In order to prevent issues with the private API endpoint and Lambda API request and concurrency limits, this functionality runs with limited concurrency, which may increase the task's overall runtime when large numbers of granules are being queued. If you are facing Lambda timeout errors with this task, we recommend converting your queue-granules
task to an ECS activity. This concurrency is configurable via the task config's concurrency
value.discover-granules
task has been updated to limit concurrency on checks to identify and skip already ingested granules in order to prevent issues with the private API endpoint and Lambda API request and concurrency limits. This may increase the task's overall runtime when large numbers of granules are discovered. If you are facing Lambda timeout errors with this task, we recommend converting your discover-granules
task to an ECS activity. This concurrency is configurable via the task config's concurrency
value.<prefix>-sfEventSqsToDbRecords
Lambda to 1024MB@cumulus/queue-granules
to respect a new config parameter: preferredQueueBatchSize
. Queue-granules will respect this batchsize as best as it can to batch granules into workflow payloads. As workflows generally rely on information such as collection and provider expected to be shared across all granules in a workflow, queue-granules will break batches up by collection, as well as provider if there is a provider
field on the granule. This may result in batches that are smaller than the preferred size, but never larger ones. The default value is 1, which preserves current behavior of queueing 1 granule per workflow.DiscoverGranulesToThrottledQueue
that discovers and writes
granules to a throttled background queue. This allows discovery and ingest
of larger numbers of granules without running into limits with lambda
concurrency.archive_api_reserved_concurrency
from 8 to 5 to use fewer reserved lambda
functions. If you see throttling errors on the <stack>-apiEndpoints
you
should increase this value.archive_api_reserved_concurrency
from 8 to 15 to prevent throttling on
the dashboard for default deployments.api/endpoints/execution-status.js
get
method to include associated granules, as
an array, for the provided execution.getExecutionArnsByGranuleCumulusId
returning a list of executionArns sorted by most recent first,
for an input Granule Cumulus ID in support of the move of translatePostgresGranuleToApiGranule
from RDS-Phase2
feature branchgetApiExecutionCumulusIds
returning cumulus IDs for a given list of executionseraseDynamoTables()
. Changed the call Promise.all()
to Promise.allSettled()
to ensure all dynamo records (provider records in particular) are deleted prior to reseeding.Changelog
[v9.6.0] 2021-09-20
PUT /granules
API endpoint to update a granuleupdateGranule
to @cumulus/api-client/granules
POST /granules/{granuleId}/executions
API endpoint to associate an execution with a granuleassociateExecutionWithGranule
to @cumulus/api-client/granules
queued
as option for granule's status
fieldMoved ssh2
package from @cumulus/common
to @cumulus/sftp-client
and
upgraded package from ^0.8.7
to ^1.0.0
to address security vulnerability
issue in previous version.
CUMULUS-2583
QueueGranules
task now updates granule status to queued
once it is added to the queue.CUMULUS-2617
Authorization
header for CMR Launchpad authentication instead of the deprecated Echo-Token
header.Added missing permission for <prefix>_ecs_cluster_instance_role
IAM role (used when running ECS services/tasks)
to allow kms:Decrypt
on the KMS key used to encrypt provider credentials. Adding this permission fixes the sync-granule
task when run as an ECS activity in a Step Function, which previously failed trying to decrypt credentials for providers.
CUMULUS-2576
Changelog
[v9.5.0] 2021-09-07
logs
record type from mappings from Elasticsearch. This change should not have
any adverse impact on existing deployments, even those which still contain logs
records,
but technically it is a breaking change to the Elasticsearch mappings.@cumulus/api-client/asyncOperations.getAsyncOperation
to return parsed JSON body
of response and not the raw API endpoint responsecumulus
module to take lambda_timeouts string map variable that allows timeouts of ingest tasks to be configurable. Allowed properties for the mapping include:POST /granules
API endpoint to create a granulecreateGranule
to @cumulus/api-client
POST /executions
endpoint to create an executionPUT /executions
endpoint to update an executiondelete
method for granules-executions.ts
implemented as part of CUMULUS-2306
from the RDS-Phase-2 feature branch in support of CUMULUS-2644.erasePostgresTables
method in serve.js
implemented as part of CUMULUS-2644,
and CUMULUS-2306 from the RDS-Phase-2 feature branch in support of CUMULUS-2644resetPostgresDb
method to support resetting between integration test suite runsUpdated processDeadLetterArchive
Lambda to return an object where
processingSucceededKeys
is an array of the S3 keys for successfully
processed objects and processingFailedKeys
is an array of S3 keys
for objects that could not be processed
Updated async operations to handle writing records to the databases
when output of the operation is undefined
CUMULUS-2644
migration
directory from the db-migration-lambda
to the db
package and
updated unit test references to migrationDir to be pulled from @cumulus/db
@cumulus/api/bin/serveUtils
to write records to PostgreSQL tablesCUMULUS-2575
@cumulus/db/src/model/granules
functions get
and exists
to
enforce parameter checking so that requests include either (granule_id
and collection_cumulus_id) or (cumulus_id) to prevent incorrect results.@cumulus/message/src/Collections.deconstructCollectionId
has been
modified to throw a descriptive error if the input collectionId
is
undefined rather than TypeError: Cannot read property 'split' of undefined
. This function has also been updated to throw descriptive errors
if an incorrectly formatted collectionId is input.Changelog
[v9.4.0] 2021-08-16
@cumulus/sync-granule
task should now properly handle
syncing files from HTTP/HTTPS providers where basic auth is
required and involves a redirect to a different host (e.g.
downloading files protected by Earthdata Login)failedExecutionStepName
to failed execution's jsonb error records.
This is the name of the Step Function step for the last failed event in the
execution's event history.allowed_redirects
field to PostgreSQL providers
tableallowedRedirects
field to DynamoDB <prefix>-providers
table@cumulus/aws-client/S3.streamS3Upload
to handle uploading the contents
of a readable stream to S3 and returning a promisereplaySqsMessages
lambda to replay archived incoming SQS
messages from S3./replays/sqs
endpoint to trigger an async operation for
the replaySqsMessages
lambda.getS3PrefixForArchivedMessage
to ingest/sqs
package to get prefix
for an archived message.async_operation
type SQS Replay
.POST
/executions/workflows-by-granules for retrieving workflow names common to a set of granulesworkflowsByGranules
to @cumulus/api-client/executions
@cumulus/db/translate/file/translateApiPdrToPostgresPdr
@cumulus/ingest/HttpProviderClient.sync
to
properly handle basic auth when redirecting to a different
host and/or host with a different portexecution
fielddata-migration2
to migrate PDRs before migrating granules.data-migration2
unit tests testing granules migration to reference
PDR records to better model the DB schema.migratePdrRecord
to use translateApiPdrToPostgresPdr
function.getS3KeyForArchivedMessage
in ingest/sqs
to store SQS messages
by queueName
.archive_api_reserved_concurrency
from 2 to 8 to prevent throttling with
the dashboard.Changelog
[v9.2.2] 2021-08-06 - [BACKPORT]
Please note changes in 9.2.2 may not yet be released in future versions, as this is a backport and patch release on the 9.2.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases.
@cumulus/db/translate/file/translateApiPdrToPostgresPdr
data-migration2
to migrate PDRs before migrating granules.data-migration2
unit tests testing granules migration to reference
PDR records to better model the DB schema.migratePdrRecord
to use translateApiPdrToPostgresPdr
function.Changelog
[v8.1.2] 2021-07-29
Please note changes in 8.1.2 may not yet be released in future versions, as this is a backport/patch release on the 8.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases.
cmr_custom_host
variable for cumulus
module can now be used to configure Cumulus to
integrate with a custom CMR host name and protocol (e.g. http://custom-cmr-host.com
). Note
that you must include a protocol (http://
or https://
) if specifying a value for this
variable.@cumulus/sync-granule
task should now properly handle
syncing files from HTTP/HTTPS providers where basic auth is
required and involves a redirect to a different host (e.g.
downloading files protected by Earthdata Login)allowed_redirects
field to PostgreSQL providers
tableallowedRedirects
field to DynamoDB <prefix>-providers
table@cumulus/aws-client/S3.streamS3Upload
to handle uploading the contents
of a readable stream to S3 and returning a promisecmr_custom_host
variable to accept a full protocol and host name
(e.g. http://cmr-custom-host.com
), whereas it previously only accepted a host namecmr_custom_host
variable was not properly forwarded into archive
, ingest
, and sqs-message-remover
modules from cumulus
module@cumulus/ingest/HttpProviderClient.sync
to
properly handle basic auth when redirecting to a different
host and/or host with a different portChangelog
[v9.2.1] 2021-07-29 - [BACKPORT]
execution
fieldChangelog
[v9.3.0] 2021-07-26
@cumulus/api-client
will now throw an error if the status code
does not match the expected response (200 for most requests and 202 for a few requests that
trigger async operations). Previously the helpers in this package would return the response
regardless of the status code, so you may need to update any code using helpers from this
package to catch or to otherwise handle errors that you may encounter.archive_api_reserved_concurrency
terraform variable on the Cumulus module and increased if you are seeing throttling errors.
The default reserved concurrency value is 8.cmr_custom_host
variable for cumulus
module can now be used to configure Cumulus to
integrate with a custom CMR host name and protocol (e.g.
http://custom-cmr-host.com
). Note that you must include a protocol
(http://
or `https://) if specifying a value for this variable.rds_connetion_heartbeat
and it's
behavior has been replaced by a more robust database connection 'retry'
solution. Users can remove this value from their configuration, regardless
of value. See the Changed
section notes on CUMULUS-2528 for more details.Added user doc describing new features related to the Cumulus dead letter archive.
CUMULUS-2327
CUMULUS-2460
POST
/executions/search-by-granules for retrieving executions from a list of granules or granule querysearchExecutionsByGranules
to @cumulus/api-client/executions
CUMULUS-2475
GET
endpoint to distribution APICUMULUS-2463
PUT /granules
reingest action allows a user to override the default execution
to use by providing an optional workflowName
or executionArn
parameter on
the request body.PUT /granules/bulkReingest
action allows a user to override the default
execution/workflow combination to reingest with by providing an optional
workflowName
on the request body.Adds workflowName
and executionArn
params to @cumulus/api-client/reingestGranules
CUMULUS-2476
HEAD
Distribution requests replicating current behavior of TEACUMULUS-2478
CUMULUS-2486
CUMULUS-2487
CUMULUS-2569
CUMULUS-2568
deletePdr
/PDR deletion functionality to @cumulus/api-client/pdrs
removeCollectionAndAllDependencies
to integration test helpersexample/spec/apiUtils.waitForApiStatus
to wait for a
record to be returned by the API with a specific value for
status
example/spec/discoverUtils.uploadS3GranuleDataForDiscovery
to upload granule data fixtures
to S3 with a randomized granule ID for discover-granules
based
integration testsexample/spec/Collections.removeCollectionAndAllDependencies
to remove a collection and
all dependent objects (e.g. PDRs, granules, executions) from the
database via the API@cumulus/api-client
:
pdrs.deletePdr
- Delete a PDR via the APIreplays.postKinesisReplays
- Submit a POST request to the /replays
endpoint for replaying Kinesis messages@cumulus/api-client/granules.getGranuleResponse
to return the raw endpoint response from the GET /granules/<granuleId>
endpoint
@cumulus/integration-tests
to example/spec/helpers/workflowUtils
:
startWorkflowExecution
startWorkflow
executeWorkflow
buildWorkflow
testWorkflow
buildAndExecuteWorkflow
buildAndStartWorkflow
example/spec/helpers/workflowUtils.executeWorkflow
now uses
waitForApiStatus
to ensure that the execution is completed
or
failed
before resolvingexample/spec/helpers/testUtils.updateAndUploadTestFileToBucket
now accepts an object of parameters rather than positional
argumentspayload
in the input payload test fixture for reconciliation report integration testsexample/spec/parallel/ingest/ingestFromPdrSpec.js
example/spec/parallel/ingest/ingestFromPdrWithChildWorkflowMetaSpec.js
example/spec/parallel/ingest/ingestFromPdrWithExecutionNamePrefixSpec.js
example/spec/parallel/ingest/ingestPdrWithNodeNameSpec.js
@cumulus/api-client/CumulusApiClientError
error class to include new properties that can be accessed directly on
the error object:
statusCode
- The HTTP status code of the API responseapiMessage
- The message from the API responseparams.pRetryOptions
parameter to
@cumulus/api-client/granules.deleteGranule
to control the retry
behaviorcmr_custom_host
variable to accept a full protocol and host name
(e.g. http://cmr-custom-host.com
), whereas it previously only accepted a host nameexample/cumulus-tf
deployment to the new Cumulus Distributionexample/README.md
rds_connection_heartbeat
as a configuration option from all
Cumulus terraform modulesdbHeartBeat
as an environmental switch from
@cumulus/db.getKnexClient
in favor of more comprehensive general db
connect retry solutionrds_connection_timing_configuration
string map to allow for
configuration and tuning of Core's internal database retry/connection
timeout behaviors. These values map to connection pool configuration
values for tarn (https://github.com/vincit/tarn.js/) which Core's database
module / knex(https://www.npmjs.com/package/knex) use for this purpose:
@cumulus/db
and all terraform mdules to set default retry
configuration values for the database module to cover existing database
heartbeat connection failures as well as all other knex/tarn connection
creation failures.cmr_custom_host
variable was not properly forwarded into archive
, ingest
, and sqs-message-remover
modules from cumulus
moduleparse-pdr
set a granule's provider to the entire provider record when a NODE_NAME
is present. Expected behavior consistent with other tasks is to set the provider name in that field.@cumulus/api-client/pdrs.getPdr
to request correct endpoint for returning a PDR from the APIpublished: true
and with a CMR link in the Dynamo/PostgreSQL databases. Now,
the CMR deletion and the Dynamo/PostgreSQL record updates will all succeed or fail
together, preventing the database records from being out of sync with CMR.@cumulus/api-client/pdrs.getPdr
to request correct
endpoint for returning a PDR from the API