0

Imagine the followign table:

tmp_migration.asset

╔══════════╦══════════════════════════╗
║ id ║ ...many other columns... ║
╠══════════╬══════════════════════════╣
║ 15 ║ ... ║
║ 16 ║ ... ║
║ 17 ║ ... ║
║ 18 ║ ... ║
║ 10020 ║ ... ║
║ 10021 ║ ... ║
╚══════════╩══════════════════════════╝

You see, the index doesn't start at 1, has gaps, etc.

Problem

I want to add a new column tempId with a continous index. The table has 80m rows. How can I do that? I googled alot of things and ended up nowhere.

Background

The table is part of a data migration project. tmp_migration is a temporary schema created as the source of the data migration. In the current step I'm trying to copy over from tmp_migration.asset to public.asset while doing data transformation. I'm using a combined INSERT INTO ... SELECT ... query for that.

The problem with that is, it takes several hours (80m rows) and I don't receive any progress notification during the run. To solve that, I wanted to use "pagination". In the bash, which is calling psql with the insert/select script, I created a loop which sets borders passed to the script.

I started with using limit / offset by adding

LIMIT :limit
OFFSET :offset;

to the script, but this slows dramatically down after being at higher "pages". So, it is advised to use WHERE on your PK over limit/offset. However, for this I need a continous PK, which I have not. Thus, I thought of adding a temporary consistent index.

Maybe there are other solutions that I don't see right now. Would be very happy about assistence.

asked Jul 19, 2022 at 10:33

1 Answer 1

2

DEMO:

CREATE TABLE test (id INT PRIMARY KEY, other_field INT);
INSERT INTO test VALUES (3,333),(55,555),(777,777);
SELECT * FROM test;
id other_field
3 333
55 555
777 777
ALTER TABLE test ADD COLUMN continuous INT;
SELECT * FROM test;
id other_field continuous
3 333 null
55 555 null
777 777 null
UPDATE test
SET continuous = calculate_rownumber.rownumber
FROM ( SELECT id, ROW_NUMBER() OVER (ORDER BY id) rownumber
 FROM test ) calculate_rownumber
WHERE test.id = calculate_rownumber.id;
SELECT * FROM test;
id other_field continuous
3 333 1
55 555 2
777 777 3

db<>fiddle here

answered Jul 19, 2022 at 10:57
4
  • Wow, thanks alot! I'm adding an index to this continous column. But this works perfectly. Commented Jul 19, 2022 at 10:57
  • Hm, I'm having difficulties to run this on the dataset with 80m rows. I'm receiving "could not write to file "base/pgsql_tmp/pgsql_tmp20298.2261": No space left on device". I am on AWS RDS with instance db.t3.medium, so I have no access to the underyling machine. Any idea why it consumes disk space? Commented Jul 19, 2022 at 11:29
  • @agoldev Sorry, I cannot help you with this problem. why it consumes disk space? I do not use Postgre, and I'm too lazy for to read the documentation for to find a reason. Commented Jul 19, 2022 at 11:35
  • That's alright. I will do the research. However, maybe you know a different approach which I could test for comparison. Commented Jul 19, 2022 at 12:47

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.