3

I'm using an AWS-managed Aurora PostgreSQL v15 instance as catalog for a large number of S3 objects. The level1_dataset table has about 2 billion rows and its schema has a metadata JSONB column. An old software bug caused the string null to be written to the metadata column (instead of leaving it empty) when no metadata were supposed to be written. About a billion rows contain the sting null and I want to clean it with:

UPDATE public.level1_dataset
SET "metadata" = NULL
WHERE "metadata"::text = 'null';

The database is hosted on a db.r6g.2xlarge with 8 vCPU cores and 64 GB memory. With this setup, and leaving all tuning to defaults, I'm getting about 42 seconds/million rows. Temporarily changing CPU cores and memory for this cleanup task is possible.

What is the most efficient way to proceed?

UPDATE: One of @laurenz-albe's approaches is to do it in batches. This is how I did it because, in my case, "id" is UUID, not integer. The SELECT has a 10% penalty in my use-case.

UPDATE public.level1_dataset
SET "metadata" = NULL
WHERE "id" IN (SELECT "id"
 FROM public.level1_dataset
 WHERE "metadata"::text = 'null'
 LIMIT 10000000);
asked May 17, 2024 at 10:58

1 Answer 1

4

The fastest way is probably

CREATE TABLE xy AS
SELECT NULLIF(metadata, 'null') AS metadata, ...
FROM level1_dataset;
DROP TABLE level1_dataset;
ALTER TABLE xy RENAME TO level1_dataset;

But that requires you to take down time.

Other than that, update in batches and VACUUM in between:

UPDATE public.level1_dataset
SET "metadata" = NULL
WHERE "metadata"::text = 'null'
AND id BETWEEN 1 AND 10000000;
VACUUM public.level1_dataset;
UPDATE public.level1_dataset
SET "metadata" = NULL
WHERE "metadata"::text = 'null'
AND id BETWEEN 10000001 AND 20000000;
VACUUM public.level1_dataset;
...
answered May 17, 2024 at 11:17
2
  • Why does DROP TABLE + ALTER TABLE RENAME imply downtime? Can't that be done in a transaction? Commented May 18, 2024 at 0:08
  • @Bergi Yes, but it will take a long time, during which the table will be unavailable. Commented May 18, 2024 at 10:14

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.