@cumulus/checksum
Advanced tools
Changelog
[v11.1.5] 2022-08-10 [BACKPORT]
Please note changes in 11.1.5 may not yet be released in future versions, as this is a backport and patch release on the 11.1.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases.
granule_cumulus_id
instead of
cumulus_id
. Previous logic removed files by matching file.cumulus_id
to granule.cumulus_id
.Changelog
[v13.2.0] 2022-8-04
cumulus
moduleingestPdrWithNodeNameSpec.js
to use deleteProvidersAndAllDependenciesByHost
function.deleteProvidersByHost
function.collectionId
in workflow input and
updated task to use said collectionId
to look up the corresponding collection record in RDS.Changelog
[v13.1.0] 2022-7-22
The changes introduced in CUMULUS-2962 will re-introduce a
files_granules_cumulus_id_index
on the files
table in the RDS database.
This index will be automatically created as part of the bootstrap lambda
function on deployment of the data-persistence
module.
In cases where the index is already applied, this update will have no effect.
Please Note: In some cases where ingest is occurring at high volume levels and/or the files table has > 150M file records, the migration may fail on deployment due to timing required to both acquire the table state needed for the migration and time to create the index given the resources available.
For reference a rx.5 large Aurora/RDS database with no activity took roughly 6 minutes to create the index for a file table with 300M records and no active ingest, however timed out when the same migration was attempted in production with possible activity on the table.
If you believe you are subject to the above consideration, you may opt to
manually create the files
table index prior to deploying this version of
Core with the following procedure:
select * from pg_indexes where tablename = 'files';
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+-------------------------+------------+---------------------------------------------------------------------------------------
public | files | files_pkey | | CREATE UNIQUE INDEX files_pkey ON public.files USING btree (cumulus_id)
public | files | files_bucket_key_unique | | CREATE UNIQUE INDEX files_bucket_key_unique ON public.files USING btree (bucket, key)
In this instance you should not see an indexname
row with
files_granules_cumulus_id_index
as the value. If you do, you should be
clear to proceed with the installation.
Stop all ingest operations in Cumulus Core according to your operational procedures. You should validate that it appears there are no active queries that appear to be inserting granules/files into the database as a secondary method of evaluating the database system state:
select pid, query, state, wait_event_type, wait_event from pg_stat_activity where state = 'active';
If query rows are returned with a query
value that involves the files table,
make sure ingest is halted and no other granule-update activity is running on
the system.
Note: In rare instances if there are hung queries that are unable to resolve, it may be necessary to
manually use psql Server Signaling
Functions
pg_cancel_backend
and/or
pg_terminate_backend
if the migration will not complete in the next step.
Run the following query to create the index. Depending on the situation this may take many minutes to complete, and you will note your CPU load and disk I/O rates increase on your cluster:
CREATE INDEX files_granule_cumulus_id_index ON files (granule_cumulus_id);
You should see a response like:
CREATE INDEX
and can verify the index files_granule_cumulus_id_index
was created:
=> select * from pg_indexes where tablename = 'files';
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+--------------------------------+------------+----------------------------------------------------------------------------------------------
public | files | files_pkey | | CREATE UNIQUE INDEX files_pkey ON public.files USING btree (cumulus_id)
public | files | files_bucket_key_unique | | CREATE UNIQUE INDEX files_bucket_key_unique ON public.files USING btree (bucket, key)
public | files | files_granule_cumulus_id_index | | CREATE INDEX files_granule_cumulus_id_index ON public.files USING btree (granule_cumulus_id)
(3 rows)
CONCURRENTLY
option for CREATE INDEX
.
This can have significant impacts on CPU/write IO, particularly if you are
already using a significant amount of your cluster resources, and may result
in failed writes or an unexpected index/database state.PostgreSQL's documentation provides more information on this option. Please be aware it is unsupported by Cumulus at this time, so community members that opt to go this route should proceed with caution.
files
table to add an index on granule_cumulus_id
move-granule
task to check the optional collection configuration parameter
meta.granuleMetadataFileExtension
to determine the granule metadata file.
If none is specified, the granule CMR metadata or ISO metadata file is used.CUMULUS-2995
CUMULUS-2863
@cumulus/api
validateAndUpdateSqsRule
method to allow 0 retries and 0 visibilityTimeout
in rule's meta.CUMULUS-2959
@cumulus/api
granules
module to convert numeric productVolume to string
when an old granule record is retrieved from DynamoDBFixed the following links on Cumulus docs' Getting Started page:
Also corrected the How to Deploy Cumulus link in the Glossary
Changelog
[v11.1.4] 2022-07-18
Please note changes in 11.1.4 may not yet be released in future versions, as this is a backport and patch release on the 11.1.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases.
The changes introduced in CUMULUS-2962 will re-introduce a
files_granules_cumulus_id_index
on the files
table in the RDS database.
This index will be automatically created as part of the bootstrap lambda
function on deployment of the data-persistence
module.
In cases where the index is already applied, this update will have no effect.
Please Note: In some cases where ingest is occurring at high volume levels and/or the files table has > 150M file records, the migration may fail on deployment due to timing required to both acquire the table state needed for the migration and time to create the index given the resources available.
For reference a rx.5 large Aurora/RDS database with no activity took roughly 6 minutes to create the index for a file table with 300M records and no active ingest, however timed out when the same migration was attempted in production with possible activity on the table.
If you believe you are subject to the above consideration, you may opt to
manually create the files
table index prior to deploying this version of
Core with the following procedure:
select * from pg_indexes where tablename = 'files';
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+-------------------------+------------+---------------------------------------------------------------------------------------
public | files | files_pkey | | CREATE UNIQUE INDEX files_pkey ON public.files USING btree (cumulus_id)
public | files | files_bucket_key_unique | | CREATE UNIQUE INDEX files_bucket_key_unique ON public.files USING btree (bucket, key)
In this instance you should not see an indexname
row with
files_granules_cumulus_id_index
as the value. If you do, you should be
clear to proceed with the installation.
Stop all ingest operations in Cumulus Core according to your operational procedures. You should validate that it appears there are no active queries that appear to be inserting granules/files into the database as a secondary method of evaluating the database system state:
select pid, query, state, wait_event_type, wait_event from pg_stat_activity where state = 'active';
If query rows are returned with a query
value that involves the files table,
make sure ingest is halted and no other granule-update activity is running on
the system.
Note: In rare instances if there are hung queries that are unable to resolve, it may be necessary to
manually use psql Server Signaling
Functions
pg_cancel_backend
and/or
pg_terminate_backend
if the migration will not complete in the next step.
Run the following query to create the index. Depending on the situation this may take many minutes to complete, and you will note your CPU load and disk I/O rates increase on your cluster:
CREATE INDEX files_granule_cumulus_id_index ON files (granule_cumulus_id);
You should see a response like:
CREATE INDEX
and can verify the index files_granule_cumulus_id_index
was created:
=> select * from pg_indexes where tablename = 'files';
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+--------------------------------+------------+----------------------------------------------------------------------------------------------
public | files | files_pkey | | CREATE UNIQUE INDEX files_pkey ON public.files USING btree (cumulus_id)
public | files | files_bucket_key_unique | | CREATE UNIQUE INDEX files_bucket_key_unique ON public.files USING btree (bucket, key)
public | files | files_granule_cumulus_id_index | | CREATE INDEX files_granule_cumulus_id_index ON public.files USING btree (granule_cumulus_id)
(3 rows)
CONCURRENTLY
option for CREATE INDEX
.
This can have significant impacts on CPU/write IO, particularly if you are
already using a significant amount of your cluster resources, and may result
in failed writes or an unexpected index/database state.PostgreSQL's documentation provides more information on this option. Please be aware it is unsupported by Cumulus at this time, so community members that opt to go this route should proceed with caution.
Changelog
[v12.0.1] 2022-07-18
Changelog
[v13.0.1] 2022-7-12
Changelog
[v10.1.3] 2022-06-28 [BACKPORT]
url_path
in the collection configurationChangelog
[v11.1.3] 2022-06-24
Please note changes in 11.1.3 may not yet be released in future versions, as this is a backport and patch release on the 11.1.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases.
move-granule
task to check the optional collection configuration parameter
meta.granuleMetadataFileExtension
to determine the granule metadata file.
If none is specified, the granule CMR metadata or ISO metadata file is used.meta.granuleMetadataFileExtension
to specify CMR metadata
file extension for tasks that utilize metadata file lookupsurl_path
in the collection configuration@cumulus/api
validateAndUpdateSqsRule
method to allow 0 retries
and 0 visibilityTimeout in rule's meta.@cumulus/api
granules
module to convert numeric productVolume to string
when an old granule record is retrieved from DynamoDB.data-migration2
granule migration logic to allow for DynamoDb granules that have a null/empty string value for execution
. The migration will now migrate them without a linked execution.Changelog
[v13.0.0] 2022-06-13
The changes introduced in CUMULUS-2955 should result in removal of
files_granule_cumulus_id_index
from the files
table (added in the v11.1.1
release). The success of this operation is dependent on system ingest load.
In rare cases where data-persistence deployment fails because the
postgres-db-migration
times out, it may be required to manually remove the
index and then redeploy:
DROP INDEX IF EXISTS files_granule_cumulus_id_index;
CUMULUS-2931
cumulus-alias
index that would collide with the required
cumulus-alias
alias. A configuration parameter
elasticsearch_remove_index_alias_conflict
on the cumulus
and
archive
modules has been added to enable the original behavior that would
remove the invalid index (and all it's data).@cumulus/es-client.bootstrapElasticSearch
signature to be
parameterized and accommodate a new parameter removeAliasConflict
which
allows/disallows the deletion of a conflicting cumulus-alias
indexmove-granule
task to check the optional collection configuration parameter
meta.granuleMetadataFileExtension
to determine the granule metadata file.
If none is specified, the granule CMR metadata or ISO metadata file is used.CUMULUS-2929
meta.granuleMetadataFileExtension
to specify CMR metadata
file extension for tasks that utilize metadata file lookupsCUMULUS-2939
@cumulus/api/lambdas/start-async-operation
to start an async operationCUMULUS-2953
skipMetadataCheck
flag to config for Hyrax metadata updates task.true
, and a granule has no CMR file, the task will simply return the input values.CUMULUS-2966
url_path
in the collection configurationCUMULUS-2965
cumulus-rds-tf
module to ignore engine_version
lifecycle changesCUMULUS-2967
CUMULUS-2955
20220126172008_files_granule_id_index
to not create an index on
granule_cumulus_id
on the files table.20220609024044_remove_files_granule_id_index
migration to revert
changes from 20220126172008_files_granule_id_index
on any deployed stacks
that might have the index to ensure consistency in deployed stacksCUMULUS-2923
CUMULUS-2939
@cumulus/api
granules/bulk*
, elasticsearch/index-from-database
and
POST reconciliationReports
endpoints to invoke StartAsyncOperation lambdaCUMULUS-2863
@cumulus/api
validateAndUpdateSqsRule
method to allow 0 retries
and 0 visibilityTimeout in rule's meta.CUMULUS-2961
data-migration2
granule migration logic to allow for DynamoDb granules that have a null/empty string value for execution
. The migration will now migrate them without a linked execution.@cumulus/api
validateAndUpdateSqsRule
method to allow 0 retries and 0 visibilityTimeout
in rule's meta.CUMULUS-2959
@cumulus/api
granules
module to convert numeric productVolume to string
when an old granule record is retrieved from DynamoDB.Changelog
[v11.1.2] 2022-06-13
Please note changes in 11.1.2 may not yet be released in future versions, as this is a backport and patch release on the 11.1.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases.
The changes introduced in CUMULUS-2955 should result in removal of
files_granule_cumulus_id_index
from the files
table (added in the v11.1.1
release). The success of this operation is dependent on system ingest load
In rare cases where data-persistence deployment fails because the
postgres-db-migration
times out, it may be required to manually remove the
index and then redeploy:
> DROP INDEX IF EXISTS postgres-db-migration;
DROP INDEX
20220126172008_files_granule_id_index
to not create an index on
granule_cumulus_id
on the files table.20220609024044_remove_files_granule_id_index
migration to revert
changes from 20220126172008_files_granule_id_index
on any deployed stacks
that might have the index to ensure consistency in deployed stacks