I'm using an AWS-managed Aurora PostgreSQL v15 instance as catalog for a large number of S3 objects. The level1_dataset
table has about 2 billion rows and its schema has a metadata
JSONB column. An old software bug caused the string null
to be written to the metadata
column (instead of leaving it empty) when no metadata were supposed to be written. About a billion rows contain the sting null
and I want to clean it with:
UPDATE public.level1_dataset
SET "metadata" = NULL
WHERE "metadata"::text = 'null';
The database is hosted on a db.r6g.2xlarge
with 8 vCPU cores and 64 GB memory. With this setup, and leaving all tuning to defaults, I'm getting about 42 seconds/million rows. Temporarily changing CPU cores and memory for this cleanup task is possible.
What is the most efficient way to proceed?
UPDATE: One of @laurenz-albe's approaches is to do it in batches. This is how I did it because, in my case, "id" is UUID, not integer. The SELECT has a 10% penalty in my use-case.
UPDATE public.level1_dataset
SET "metadata" = NULL
WHERE "id" IN (SELECT "id"
FROM public.level1_dataset
WHERE "metadata"::text = 'null'
LIMIT 10000000);
1 Answer 1
The fastest way is probably
CREATE TABLE xy AS
SELECT NULLIF(metadata, 'null') AS metadata, ...
FROM level1_dataset;
DROP TABLE level1_dataset;
ALTER TABLE xy RENAME TO level1_dataset;
But that requires you to take down time.
Other than that, update in batches and VACUUM
in between:
UPDATE public.level1_dataset
SET "metadata" = NULL
WHERE "metadata"::text = 'null'
AND id BETWEEN 1 AND 10000000;
VACUUM public.level1_dataset;
UPDATE public.level1_dataset
SET "metadata" = NULL
WHERE "metadata"::text = 'null'
AND id BETWEEN 10000001 AND 20000000;
VACUUM public.level1_dataset;
...
-
Why does
DROP TABLE
+ALTER TABLE RENAME
imply downtime? Can't that be done in a transaction?Bergi– Bergi2024年05月18日 00:08:31 +00:00Commented May 18, 2024 at 0:08 -
@Bergi Yes, but it will take a long time, during which the table will be unavailable.Laurenz Albe– Laurenz Albe2024年05月18日 10:14:33 +00:00Commented May 18, 2024 at 10:14
Explore related questions
See similar questions with these tags.