UPDATE incredibly slow on small PostgreSQL query - EXPLAIN ANALYZE causes it to hang

Question 1

I'm trying to debug a very slow UPDATE query in one of my tables.

The table name is beepers. It has 2486 rows and weighs 1204 MB. It has 18 columns and two indices: one on id and the other on friendly_name (a string with a legible name, i.e. "correct staple").

Here's what I see through Postico:

CREATE UNIQUE INDEX beepers_pkey ON beepers(id int4_ops);
CREATE INDEX ix_beepers_friendly_name ON beepers(friendly_name text_ops);

I have a couple hundred beepers out in the field, and every 30s each beeper reaches out to my server to update when they were last active.

The query being executed behind the scenes is:

UPDATE beepers SET updated_at=NOW()::timestamp, last_active=NOW()::timestamp WHERE beepers.id = 399;

Note that this query is not updating any of the two indexed fields.

--

I'm using python and flask-sqlalchemy to interact with this DB (hosted via Google Cloud SQL) running PostgreSQL 9.6.

Most of the time, this UPDATE happens in under 1 second.

However, a lot of the time, this UPDATE takes over 20 seconds, sometimes even hitting >100 seconds.

If I try running this query back-to-back in a little script, I see a large spike in Lock Wait according to Google Cloud SQL's Query Insights tab.

I tried running the query with EXPLAIN ANALYZE at the front, but I get the strangest behavior!

import db
from sqlalchemy import text
sql = text("UPDATE beepers SET updated_at=NOW()::timestamp, last_active=NOW()::timestamp WHERE beepers.id = 399;")
db.engine.execute(sql)
# Querying the DB shows that the row updated! This is expected.
# Now let's add "EXPLAIN ANALYZE" to the front.
sql = text("EXPLAIN ANALYZE UPDATE beepers SET updated_at=NOW()::timestamp, db.engine.execute(sql)
db.engine.execute(sql)
db.engine.execute(sql)
# I can try running this over and over - but querying the DB shows that these rows DO NOT update!
# If I now change the query back to remove the "EXPLAIN ANALYZE"...
sql = text("UPDATE beepers SET updated_at=NOW()::timestamp, last_active=NOW()::timestamp WHERE beepers.id = 399;")
db.engine.execute(sql)
# Python completely hangs!

For some reason the queries with EXPLAIN ANALYZE don't actually perform the UPDATE, and the next call (minus the EXPLAIN) completely hangs!

How can I figure out what's causing this strangeness?

When I do get some output from EXPLAIN ANALYZE, I never catch the super slow run:

Update on public.beepers (cost=8.41..12.43 rows=1 width=587) (actual time=0.395..0.395 rows=0 loops=1)
 Buffers: shared hit=21 dirtied=1
 -> Bitmap Heap Scan on public.beepers (cost=8.41..12.43 rows=1 width=587) (actual time=0.013..0.015 rows=1 loops=1)
 Output: created_at, '2022-02-07 17:53:58.657941'::timestamp without time zone, id, friendly_name, '2022-02-07 17:53:58.652552'::timestamp without time zone, last_ip, ctid ...
 Recheck Cond: (beeperss.id = 399)
 Heap Blocks: exact=1
 Buffers: shared hit=5
 -> Bitmap Index Scan on beepers_pkey (cost=0.00..8.41 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)
 Index Cond: (beepers.id = 399)
 Buffers: shared hit=4
Planning time: 0.348 ms
Execution time: 0.465 ms

UPDATE 1

I've run the command from this stackoverflow: Measure the size of a PostgreSQL table row before and after running VACUUM (ANALYZE,FULL).

Before:

 metric | bytes | bytes_pretty | bytes_per_row
-----------------------------------+------------+--------------+---------------
 core_relation_size | 1262665728 | 1204 MB | 507910
 visibility_map | 40960 | 40 kB | 16
 free_space_map | 327680 | 320 kB | 131
 table_size_incl_toast | 5456994304 | 5204 MB | 2195090
 indexes_size | 60055552 | 57 MB | 24157
 total_size_incl_toast_and_indexes | 5517049856 | 5261 MB | 2219247
 live_rows_in_text_representation | 7994740 | 7807 kB | 3215
 ------------------------------ | | |
 row_count | 2486 | |
 live_tuples | 2338 | |
 dead_tuples | 457 | |

After:

 metric | bytes | bytes_pretty | bytes_per_row
-----------------------------------+---------+--------------+---------------
 core_relation_size | 2359296 | 2304 kB | 949
 visibility_map | 0 | 0 bytes | 0
 free_space_map | 0 | 0 bytes | 0
 table_size_incl_toast | 7708672 | 7528 kB | 3100
 indexes_size | 196608 | 192 kB | 79
 total_size_incl_toast_and_indexes | 7905280 | 7720 kB | 3179
 live_rows_in_text_representation | 7994727 | 7807 kB | 3215
 ------------------------------ | | |
 row_count | 2486 | |
 live_tuples | 2486 | |
 dead_tuples | 36 | |

Hot dang. Somehow 1.2GB shrank to 2.3MB.

This strikes me as kind of strange, because in the logs I see an "automatic vacuum" running regularly with the following output:

LOG: automatic vacuum of table "mydb.public.beepers": index scans: 0
pages: 0 removed, 154134 remain, 2 skipped due to pins, 152707 skipped frozen
tuples: 0 removed, 11085 remain, 6406 are dead but not yet removable
buffer usage: 16379 hits, 0 misses, 1 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.040 MB/s
system usage: CPU 0.00s/0.01u sec elapsed 0.19 sec

Otherwise, I think I've found what's going on: lots of lock waiting. The logs look something like this:

[23086]: [1-1] LOG: process 23086 still waiting for ShareLock on transaction 2432622359 after 1000.156 ms
[23086]: [2-1] DETAIL: Process holding the lock: 12946. Wait queue: 23086.
[23086]: [3-1] CONTEXT: while updating tuple (17,4) in relation "beepers"
[23086]: [4-1] STATEMENT: UPDATE beepers SET updated_at='2022-02-09T02:13:05.882135'::timestamp, last_active='2022-02-09T02:13:05.881215'::timestamp WHERE beepers.id = 1154
[20888]: [2-1] DETAIL: Process holding the lock: 23086. Wait queue: 20888.
[22744]: [2-1] DETAIL: Process holding the lock: 23086. Wait queue: 20888, 22744.
[22743]: [2-1] DETAIL: Process holding the lock: 23086. Wait queue: 20888, 22744, 22743.
...
[22527]: [2-1] DETAIL: Process holding the lock: 23086. Wait queue: 20888, 22744, 22743, 22931, 22902, 21294, 22693, 23277, 23276, 22695, 23275, 22243, 22692, 21005, 22653, 21898, 21734, 22527.
[23086]: [5-1] LOG: process 23086 acquired ShareLock on transaction 2432622359 after 238962.352 ms
[23086]: [6-1] CONTEXT: while updating tuple (17,4) in relation "beepers"
[23086]: [7-1] STATEMENT: UPDATE beepers SET updated_at='2022-02-09T02:13:05.882135'::timestamp, last_active='2022-02-09T02:13:05.881215'::timestamp WHERE beepers.id = 1154
[23086]: [8-1] LOG: duration: 238963.367 ms statement: UPDATE beepers SET updated_at='2022-02-09T02:13:05.882135'::timestamp, last_active='2022-02-09T02:13:05.881215'::timestamp WHERE beepers.id = 1154

In this case, process 12946 was holding the Sharelock on my beepers table for a long time. Once 12946 let go of it, 23086 acquired it and the backlog of waiting processes could be cleared. But of all the logs put out by process 12946 - I never once see a mention of it querying or using the beepers table. Why would this process have a lock on the beepers table?

Question 2

Since you mention, you didn't find super slow run. Can you paste here, log of your PostgreSQL server? When update take long than usually, what happen in database? Is it peak time? Or some transaction that refer to table beepers?

Question 3

Turn on log_lock_waits

Question 4

I've got log_min_duration_statement=300 and log_lock_waits=on. But I'm finding it incredibly difficult to find the relevant logs corresponding to a query in the GCP logs. Any tips there?

Question 5

1+GB size for a table with 2k rows means either it has huge text columns or it is highly bloated.

Why don't you try running vacuum (analyze, full) against the database in question (if you haven't already done that) and re-check the size of your table again?

VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.

^{Reference: VACUUM (PostgreSQL | Docs)}

Question 6

Welcome to the DBA.SE community. I have allowed myself to update your answer so that it provides lasting quality for the community. In its initial form the answer was just a possible solution. In its current form providing some reference to the original documentation for VACCUUM your answer now provides a possible reason why you are suggesting this as a solution. Consider reading up on How to Answer in the help centre. Thanks for participating in the community.

Question 7

Thanks John.. Noted.

Question 8

It does indeed have a large text column, of type JSONB. But I measured the size of an entry there to be ~18KB. 18KB * 2.4K should be around 43MB. So you're right, something is definitely strange here. I'll try running the VACUUM and see what happens to the size.

Question 9

Please see my updated question. The table shrank to 2.3MB after the VACUUM operation.

Question 10

You have mentioned, the "autovacuum vacuum" is running frequently. However note that "vacuum" alone does not shrink the data physically, it only removes dead tuples so that the empty space remaining will be reused by new data. "vacuum full" does the shrinking.

Sajith P Shetty Sajith P Shetty 3123 silver badges12 bronze badges · Answer 1 · 2022-02-08 06:35:30Z

2

1+GB size for a table with 2k rows means either it has huge text columns or it is highly bloated.

Why don't you try running vacuum (analyze, full) against the database in question (if you haven't already done that) and re-check the size of your table again?

VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.

^{Reference: VACUUM (PostgreSQL | Docs)}

Share

Improve this answer

edited Feb 8, 2022 at 7:57

user1822

answered Feb 8, 2022 at 6:35

Sajith P Shetty's user avatar

Sajith P Shetty Sajith P Shetty

3123 silver badges12 bronze badges

5

Welcome to the DBA.SE community. I have allowed myself to update your answer so that it provides lasting quality for the community. In its initial form the answer was just a possible solution. In its current form providing some reference to the original documentation for VACCUUM your answer now provides a possible reason why you are suggesting this as a solution. Consider reading up on How to Answer in the help centre. Thanks for participating in the community.

John K. N.
– John K. N.

2022年02月08日 07:15:31 +00:00
Commented Feb 8, 2022 at 7:15
Thanks John.. Noted.

Sajith P Shetty
– Sajith P Shetty

2022年02月08日 07:40:25 +00:00
Commented Feb 8, 2022 at 7:40
It does indeed have a large text column, of type JSONB. But I measured the size of an entry there to be ~18KB. 18KB * 2.4K should be around 43MB. So you're right, something is definitely strange here. I'll try running the VACUUM and see what happens to the size.

R. Gosman
– R. Gosman

2022年02月08日 21:05:44 +00:00
Commented Feb 8, 2022 at 21:05
Please see my updated question. The table shrank to 2.3MB after the VACUUM operation.

R. Gosman
– R. Gosman

2022年02月09日 04:14:45 +00:00
Commented Feb 9, 2022 at 4:14
You have mentioned, the "autovacuum vacuum" is running frequently. However note that "vacuum" alone does not shrink the data physically, it only removes dead tuples so that the empty space remaining will be reused by new data. "vacuum full" does the shrinking.

Sajith P Shetty
– Sajith P Shetty

2022年02月10日 09:35:46 +00:00
Commented Feb 10, 2022 at 9:35

Add a comment |

Stack Exchange Network

UPDATE incredibly slow on small PostgreSQL query - EXPLAIN ANALYZE causes it to hang

UPDATE 1

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

UPDATE incredibly slow on small PostgreSQL query - EXPLAIN ANALYZE causes it to hang

UPDATE 1

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions