Fill a column with continous index in Postgres

Question 1

Imagine the followign table:

`tmp_migration.asset`

╔══════════╦══════════════════════════╗
║ id ║ ...many other columns... ║
╠══════════╬══════════════════════════╣
║ 15 ║ ... ║
║ 16 ║ ... ║
║ 17 ║ ... ║
║ 18 ║ ... ║
║ 10020 ║ ... ║
║ 10021 ║ ... ║
╚══════════╩══════════════════════════╝

You see, the index doesn't start at 1, has gaps, etc.

Problem

I want to add a new column tempId with a continous index. The table has 80m rows. How can I do that? I googled alot of things and ended up nowhere.

Background

The table is part of a data migration project. tmp_migration is a temporary schema created as the source of the data migration. In the current step I'm trying to copy over from tmp_migration.asset to public.asset while doing data transformation. I'm using a combined INSERT INTO ... SELECT ... query for that.

The problem with that is, it takes several hours (80m rows) and I don't receive any progress notification during the run. To solve that, I wanted to use "pagination". In the bash, which is calling psql with the insert/select script, I created a loop which sets borders passed to the script.

I started with using limit / offset by adding

LIMIT :limit
OFFSET :offset;

to the script, but this slows dramatically down after being at higher "pages". So, it is advised to use WHERE on your PK over limit/offset. However, for this I need a continous PK, which I have not. Thus, I thought of adding a temporary consistent index.

Maybe there are other solutions that I don't see right now. Would be very happy about assistence.

Question 2

DEMO:

CREATE TABLE test (id INT PRIMARY KEY, other_field INT);
INSERT INTO test VALUES (3,333),(55,555),(777,777);
SELECT * FROM test;

id	other_field
3	333
55	555
777	777

ALTER TABLE test ADD COLUMN continuous INT;
SELECT * FROM test;
id other_field continuous

3 333 null

55 555 null

777 777 null

UPDATE test
SET continuous = calculate_rownumber.rownumber
FROM ( SELECT id, ROW_NUMBER() OVER (ORDER BY id) rownumber
 FROM test ) calculate_rownumber
WHERE test.id = calculate_rownumber.id;
SELECT * FROM test;

id	other_field	continuous
3	333	1
55	555	2
777	777	3

db<>fiddle here

Question 3

Wow, thanks alot! I'm adding an index to this continous column. But this works perfectly.

Question 4

Hm, I'm having difficulties to run this on the dataset with 80m rows. I'm receiving "could not write to file "base/pgsql_tmp/pgsql_tmp20298.2261": No space left on device". I am on AWS RDS with instance db.t3.medium, so I have no access to the underyling machine. Any idea why it consumes disk space?

Question 5

@agoldev Sorry, I cannot help you with this problem. why it consumes disk space? I do not use Postgre, and I'm too lazy for to read the documentation for to find a reason.

Question 6

That's alright. I will do the research. However, maybe you know a different approach which I could test for comparison.

Akina Akina 20.8k2 gold badges20 silver badges22 bronze badges · Accepted Answer · 2022-07-19 10:57:10Z

2

DEMO:

CREATE TABLE test (id INT PRIMARY KEY, other_field INT);
INSERT INTO test VALUES (3,333),(55,555),(777,777);
SELECT * FROM test;

id	other_field
3	333
55	555
777	777

ALTER TABLE test ADD COLUMN continuous INT;
SELECT * FROM test;
id other_field continuous

3 333 null

55 555 null

777 777 null

UPDATE test
SET continuous = calculate_rownumber.rownumber
FROM ( SELECT id, ROW_NUMBER() OVER (ORDER BY id) rownumber
 FROM test ) calculate_rownumber
WHERE test.id = calculate_rownumber.id;
SELECT * FROM test;

id	other_field	continuous
3	333	1
55	555	2
777	777	3

db<>fiddle here

Share

Improve this answer

answered Jul 19, 2022 at 10:57

Akina's user avatar

Akina Akina

20.8k2 gold badges20 silver badges22 bronze badges

4

Wow, thanks alot! I'm adding an index to this continous column. But this works perfectly.

bln_dev
– bln_dev

2022年07月19日 10:57:41 +00:00
Commented Jul 19, 2022 at 10:57
Hm, I'm having difficulties to run this on the dataset with 80m rows. I'm receiving "could not write to file "base/pgsql_tmp/pgsql_tmp20298.2261": No space left on device". I am on AWS RDS with instance db.t3.medium, so I have no access to the underyling machine. Any idea why it consumes disk space?

bln_dev
– bln_dev

2022年07月19日 11:29:40 +00:00
Commented Jul 19, 2022 at 11:29
@agoldev Sorry, I cannot help you with this problem. why it consumes disk space? I do not use Postgre, and I'm too lazy for to read the documentation for to find a reason.

Akina
– Akina

2022年07月19日 11:35:30 +00:00
Commented Jul 19, 2022 at 11:35
That's alright. I will do the research. However, maybe you know a different approach which I could test for comparison.

bln_dev
– bln_dev

2022年07月19日 12:47:42 +00:00
Commented Jul 19, 2022 at 12:47

Add a comment |

Stack Exchange Network

Fill a column with continous index in Postgres

`tmp_migration.asset`

Problem

Background

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Fill a column with continous index in Postgres

tmp_migration.asset

Problem

Background

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`tmp_migration.asset`