Imagine the followign table:
tmp_migration.asset
╔══════════╦══════════════════════════╗
║ id ║ ...many other columns... ║
╠══════════╬══════════════════════════╣
║ 15 ║ ... ║
║ 16 ║ ... ║
║ 17 ║ ... ║
║ 18 ║ ... ║
║ 10020 ║ ... ║
║ 10021 ║ ... ║
╚══════════╩══════════════════════════╝
You see, the index doesn't start at 1, has gaps, etc.
Problem
I want to add a new column tempId
with a continous index. The table has 80m rows. How can I do that? I googled alot of things and ended up nowhere.
Background
The table is part of a data migration project. tmp_migration
is a temporary schema created as the source of the data migration. In the current step I'm trying to copy over from tmp_migration.asset
to public.asset
while doing data transformation. I'm using a combined INSERT INTO ... SELECT ...
query for that.
The problem with that is, it takes several hours (80m rows) and I don't receive any progress notification during the run. To solve that, I wanted to use "pagination". In the bash, which is calling psql
with the insert/select script, I created a loop which sets borders passed to the script.
I started with using limit / offset by adding
LIMIT :limit
OFFSET :offset;
to the script, but this slows dramatically down after being at higher "pages". So, it is advised to use WHERE
on your PK over limit/offset. However, for this I need a continous PK, which I have not. Thus, I thought of adding a temporary consistent index.
Maybe there are other solutions that I don't see right now. Would be very happy about assistence.
1 Answer 1
DEMO:
CREATE TABLE test (id INT PRIMARY KEY, other_field INT); INSERT INTO test VALUES (3,333),(55,555),(777,777); SELECT * FROM test;
id other_field 3 333 55 555 777 777
ALTER TABLE test ADD COLUMN continuous INT; SELECT * FROM test;
id other_field continuous 3 333 null 55 555 null 777 777 null
UPDATE test SET continuous = calculate_rownumber.rownumber FROM ( SELECT id, ROW_NUMBER() OVER (ORDER BY id) rownumber FROM test ) calculate_rownumber WHERE test.id = calculate_rownumber.id; SELECT * FROM test;
id other_field continuous 3 333 1 55 555 2 777 777 3
db<>fiddle here
-
Wow, thanks alot! I'm adding an index to this
continous
column. But this works perfectly.bln_dev– bln_dev2022年07月19日 10:57:41 +00:00Commented Jul 19, 2022 at 10:57 -
Hm, I'm having difficulties to run this on the dataset with 80m rows. I'm receiving "could not write to file "base/pgsql_tmp/pgsql_tmp20298.2261": No space left on device". I am on AWS RDS with instance db.t3.medium, so I have no access to the underyling machine. Any idea why it consumes disk space?bln_dev– bln_dev2022年07月19日 11:29:40 +00:00Commented Jul 19, 2022 at 11:29
-
@agoldev Sorry, I cannot help you with this problem. why it consumes disk space? I do not use Postgre, and I'm too lazy for to read the documentation for to find a reason.Akina– Akina2022年07月19日 11:35:30 +00:00Commented Jul 19, 2022 at 11:35
-
That's alright. I will do the research. However, maybe you know a different approach which I could test for comparison.bln_dev– bln_dev2022年07月19日 12:47:42 +00:00Commented Jul 19, 2022 at 12:47