The changes introduced in CUMULUS-2962 will re-introduce a
files_granules_cumulus_id_index
on the files
table in the RDS database.
This index will be automatically created as part of the bootstrap lambda
function on deployment of the data-persistence
module.
In cases where the index is already applied, this update will have no effect.
Please Note: In some cases where ingest is occurring at high volume levels and/or the
files table has > 150M file records, the migration may
fail on deployment due to timing required to both acquire the table state needed for the
migration and time to create the index given the resources available.
For reference a rx.5 large Aurora/RDS database
with no activity took roughly 6 minutes to create the index for a file table with 300M records and no active ingest, however timed out when the same migration was attempted
in production with possible activity on the table.
If you believe you are subject to the above consideration, you may opt to
manually create the files
table index prior to deploying this version of
Core with the following procedure:
- Verify you do not have the index:
select * from pg_indexes where tablename = 'files';
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+-------------------------+------------+---------------------------------------------------------------------------------------
public | files | files_pkey | | CREATE UNIQUE INDEX files_pkey ON public.files USING btree (cumulus_id)
public | files | files_bucket_key_unique | | CREATE UNIQUE INDEX files_bucket_key_unique ON public.files USING btree (bucket, key)
In this instance you should not see an indexname
row with
files_granules_cumulus_id_index
as the value. If you do, you should be
clear to proceed with the installation.
Stop all ingest operations in Cumulus Core according to your operational
procedures. You should validate that it appears there are no active queries that
appear to be inserting granules/files into the database as a secondary method
of evaluating the database system state:
select pid, query, state, wait_event_type, wait_event from pg_stat_activity where state = 'active';
If query rows are returned with a query
value that involves the files table,
make sure ingest is halted and no other granule-update activity is running on
the system.
Note: In rare instances if there are hung queries that are unable to resolve, it may be necessary to
manually use psql Server Signaling
Functions
pg_cancel_backend
and/or
pg_terminate_backend
if the migration will not complete in the next step.
Run the following query to create the index. Depending on the situation
this may take many minutes to complete, and you will note your CPU load and
disk I/O rates increase on your cluster:
CREATE INDEX files_granule_cumulus_id_index ON files (granule_cumulus_id);
You should see a response like:
CREATE INDEX
and can verify the index files_granule_cumulus_id_index
was created:
=> select * from pg_indexes where tablename = 'files';
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+--------------------------------+------------+----------------------------------------------------------------------------------------------
public | files | files_pkey | | CREATE UNIQUE INDEX files_pkey ON public.files USING btree (cumulus_id)
public | files | files_bucket_key_unique | | CREATE UNIQUE INDEX files_bucket_key_unique ON public.files USING btree (bucket, key)
public | files | files_granule_cumulus_id_index | | CREATE INDEX files_granule_cumulus_id_index ON public.files USING btree (granule_cumulus_id)
(3 rows)
- Once this is complete, you may deploy this version of Cumulus as you
normally would.
If you are unable to stop ingest for the above procedure and cannot
migrate with deployment, you may be able to manually create the index while
writes are ongoing using postgres's
CONCURRENTLY
option for CREATE INDEX
.
This can have significant impacts on CPU/write IO, particularly if you are
already using a significant amount of your cluster resources, and may result
in failed writes or an unexpected index/database state.
PostgreSQL's
documentation
provides more information on this option. Please be aware it is
unsupported by Cumulus at this time, so community members that opt to go
this route should proceed with caution.