I have a PostgreSQL table (tab_A) with about 32 million records, and I have a second PostgreSQL table (tab_B) with about 4000 records. tab_B contains 3 fields whose values I intend to pass on to tab_A (field1, field2, field3).
tab_A
32 million records
Unique identifier (id)
BTREE index
tab_B
4000 records
Unique identifier (id)
BTREE index
I am trying to do the job with the following query (previous to do this I have already created the 3 new fields in tab_A to host the values):
UPDATE tab_A
SET field1 = t2.field1, field2 = t2.field2, field3 = t2.field3
FROM tab_A t1 JOIN tab_B t2
ON t1.uprn = t2.uprn;
This query runs on and on for more than 5 hours and I eventually need to stop it because it doesn't seem to me it should be taking that long (my understanding is that if both tables contain an index this should be pretty fast).
Any ideas on whether I am missing something here? Perhaps it's normal it takes that long taking into account tab_A contains 32 million records? Any other approach to run this more efficiently?
-
Excellent. It took less than a second. Just for the sake of understanding all this a bit better. Would the join have worked too if I had placed tab_B in the "FROM" and tab_A in the "JOIN" (but actually using "RIGHT JOIN") ?Pitrako Junior– Pitrako Junior2019年12月27日 11:49:42 +00:00Commented Dec 27, 2019 at 11:49
-
it's a good thing that you did stop it. it would have corrupted your data if it committed,Jasen– Jasen2019年12月28日 05:55:19 +00:00Commented Dec 28, 2019 at 5:55
1 Answer 1
Don't repeat the target table in the FROM clause:
UPDATE tab_A
SET field1 = t2.field1, field2 = t2.field2, field3 = t2.field3
FROM tab_B t2
WHERE tab_A.uprn = t2.uprn;
Quote from the manual
Note that the target table must not appear in the from_list, unless you intend a self-join
(emphasis mine)