I'm trying to debug a very slow UPDATE
query in one of my tables.
The table name is beepers
. It has 2486 rows and weighs 1204 MB.
It has 18 columns and two indices: one on id
and the other on friendly_name
(a string with a legible name, i.e. "correct staple").
Here's what I see through Postico:
CREATE UNIQUE INDEX beepers_pkey ON beepers(id int4_ops);
CREATE INDEX ix_beepers_friendly_name ON beepers(friendly_name text_ops);
I have a couple hundred beepers out in the field, and every 30s each beeper reaches out to my server to update when they were last active.
The query being executed behind the scenes is:
UPDATE beepers SET updated_at=NOW()::timestamp, last_active=NOW()::timestamp WHERE beepers.id = 399;
Note that this query is not updating any of the two indexed fields.
--
I'm using python
and flask-sqlalchemy
to interact with this DB (hosted via Google Cloud SQL) running PostgreSQL 9.6.
Most of the time, this UPDATE
happens in under 1 second.
However, a lot of the time, this UPDATE
takes over 20 seconds, sometimes even hitting >100 seconds.
If I try running this query back-to-back in a little script, I see a large spike in Lock Wait
according to Google Cloud SQL's Query Insights tab.
I tried running the query with EXPLAIN ANALYZE
at the front, but I get the strangest behavior!
import db
from sqlalchemy import text
sql = text("UPDATE beepers SET updated_at=NOW()::timestamp, last_active=NOW()::timestamp WHERE beepers.id = 399;")
db.engine.execute(sql)
# Querying the DB shows that the row updated! This is expected.
# Now let's add "EXPLAIN ANALYZE" to the front.
sql = text("EXPLAIN ANALYZE UPDATE beepers SET updated_at=NOW()::timestamp, db.engine.execute(sql)
db.engine.execute(sql)
db.engine.execute(sql)
# I can try running this over and over - but querying the DB shows that these rows DO NOT update!
# If I now change the query back to remove the "EXPLAIN ANALYZE"...
sql = text("UPDATE beepers SET updated_at=NOW()::timestamp, last_active=NOW()::timestamp WHERE beepers.id = 399;")
db.engine.execute(sql)
# Python completely hangs!
For some reason the queries with EXPLAIN ANALYZE
don't actually perform the UPDATE
, and the next call (minus the EXPLAIN
) completely hangs!
How can I figure out what's causing this strangeness?
When I do get some output from EXPLAIN ANALYZE
, I never catch the super slow run:
Update on public.beepers (cost=8.41..12.43 rows=1 width=587) (actual time=0.395..0.395 rows=0 loops=1)
Buffers: shared hit=21 dirtied=1
-> Bitmap Heap Scan on public.beepers (cost=8.41..12.43 rows=1 width=587) (actual time=0.013..0.015 rows=1 loops=1)
Output: created_at, '2022-02-07 17:53:58.657941'::timestamp without time zone, id, friendly_name, '2022-02-07 17:53:58.652552'::timestamp without time zone, last_ip, ctid ...
Recheck Cond: (beeperss.id = 399)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on beepers_pkey (cost=0.00..8.41 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (beepers.id = 399)
Buffers: shared hit=4
Planning time: 0.348 ms
Execution time: 0.465 ms
UPDATE 1
I've run the command from this stackoverflow: Measure the size of a PostgreSQL table row before and after running VACUUM (ANALYZE,FULL)
.
Before:
metric | bytes | bytes_pretty | bytes_per_row
-----------------------------------+------------+--------------+---------------
core_relation_size | 1262665728 | 1204 MB | 507910
visibility_map | 40960 | 40 kB | 16
free_space_map | 327680 | 320 kB | 131
table_size_incl_toast | 5456994304 | 5204 MB | 2195090
indexes_size | 60055552 | 57 MB | 24157
total_size_incl_toast_and_indexes | 5517049856 | 5261 MB | 2219247
live_rows_in_text_representation | 7994740 | 7807 kB | 3215
------------------------------ | | |
row_count | 2486 | |
live_tuples | 2338 | |
dead_tuples | 457 | |
After:
metric | bytes | bytes_pretty | bytes_per_row
-----------------------------------+---------+--------------+---------------
core_relation_size | 2359296 | 2304 kB | 949
visibility_map | 0 | 0 bytes | 0
free_space_map | 0 | 0 bytes | 0
table_size_incl_toast | 7708672 | 7528 kB | 3100
indexes_size | 196608 | 192 kB | 79
total_size_incl_toast_and_indexes | 7905280 | 7720 kB | 3179
live_rows_in_text_representation | 7994727 | 7807 kB | 3215
------------------------------ | | |
row_count | 2486 | |
live_tuples | 2486 | |
dead_tuples | 36 | |
Hot dang. Somehow 1.2GB shrank to 2.3MB.
This strikes me as kind of strange, because in the logs I see an "automatic vacuum" running regularly with the following output:
LOG: automatic vacuum of table "mydb.public.beepers": index scans: 0
pages: 0 removed, 154134 remain, 2 skipped due to pins, 152707 skipped frozen
tuples: 0 removed, 11085 remain, 6406 are dead but not yet removable
buffer usage: 16379 hits, 0 misses, 1 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.040 MB/s
system usage: CPU 0.00s/0.01u sec elapsed 0.19 sec
Otherwise, I think I've found what's going on: lots of lock waiting. The logs look something like this:
[23086]: [1-1] LOG: process 23086 still waiting for ShareLock on transaction 2432622359 after 1000.156 ms
[23086]: [2-1] DETAIL: Process holding the lock: 12946. Wait queue: 23086.
[23086]: [3-1] CONTEXT: while updating tuple (17,4) in relation "beepers"
[23086]: [4-1] STATEMENT: UPDATE beepers SET updated_at='2022-02-09T02:13:05.882135'::timestamp, last_active='2022-02-09T02:13:05.881215'::timestamp WHERE beepers.id = 1154
[20888]: [2-1] DETAIL: Process holding the lock: 23086. Wait queue: 20888.
[22744]: [2-1] DETAIL: Process holding the lock: 23086. Wait queue: 20888, 22744.
[22743]: [2-1] DETAIL: Process holding the lock: 23086. Wait queue: 20888, 22744, 22743.
...
[22527]: [2-1] DETAIL: Process holding the lock: 23086. Wait queue: 20888, 22744, 22743, 22931, 22902, 21294, 22693, 23277, 23276, 22695, 23275, 22243, 22692, 21005, 22653, 21898, 21734, 22527.
[23086]: [5-1] LOG: process 23086 acquired ShareLock on transaction 2432622359 after 238962.352 ms
[23086]: [6-1] CONTEXT: while updating tuple (17,4) in relation "beepers"
[23086]: [7-1] STATEMENT: UPDATE beepers SET updated_at='2022-02-09T02:13:05.882135'::timestamp, last_active='2022-02-09T02:13:05.881215'::timestamp WHERE beepers.id = 1154
[23086]: [8-1] LOG: duration: 238963.367 ms statement: UPDATE beepers SET updated_at='2022-02-09T02:13:05.882135'::timestamp, last_active='2022-02-09T02:13:05.881215'::timestamp WHERE beepers.id = 1154
In this case, process 12946
was holding the Sharelock
on my beepers
table for a long time. Once 12946
let go of it, 23086
acquired it and the backlog of waiting processes could be cleared. But of all the logs put out by process 12946
- I never once see a mention of it querying or using the beepers
table. Why would this process have a lock on the beepers
table?
-
Since you mention, you didn't find super slow run. Can you paste here, log of your PostgreSQL server? When update take long than usually, what happen in database? Is it peak time? Or some transaction that refer to table beepers?sihar– sihar2022年02月08日 03:30:53 +00:00Commented Feb 8, 2022 at 3:30
-
Turn on log_lock_waitsjjanes– jjanes2022年02月08日 06:32:56 +00:00Commented Feb 8, 2022 at 6:32
-
I've got log_min_duration_statement=300 and log_lock_waits=on. But I'm finding it incredibly difficult to find the relevant logs corresponding to a query in the GCP logs. Any tips there?R. Gosman– R. Gosman2022年02月08日 21:08:33 +00:00Commented Feb 8, 2022 at 21:08
1 Answer 1
1+GB size for a table with 2k rows means either it has huge text columns or it is highly bloated.
Why don't you try running vacuum (analyze, full)
against the database in question (if you haven't already done that) and re-check the size of your table again?
VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.
Reference: VACUUM (PostgreSQL | Docs)
-
Welcome to the DBA.SE community. I have allowed myself to update your answer so that it provides lasting quality for the community. In its initial form the answer was just a possible solution. In its current form providing some reference to the original documentation for
VACCUUM
your answer now provides a possible reason why you are suggesting this as a solution. Consider reading up on How to Answer in the help centre. Thanks for participating in the community.John K. N.– John K. N.2022年02月08日 07:15:31 +00:00Commented Feb 8, 2022 at 7:15 -
Thanks John.. Noted.Sajith P Shetty– Sajith P Shetty2022年02月08日 07:40:25 +00:00Commented Feb 8, 2022 at 7:40
-
It does indeed have a large text column, of type JSONB. But I measured the size of an entry there to be ~18KB. 18KB * 2.4K should be around 43MB. So you're right, something is definitely strange here. I'll try running the VACUUM and see what happens to the size.R. Gosman– R. Gosman2022年02月08日 21:05:44 +00:00Commented Feb 8, 2022 at 21:05
-
Please see my updated question. The table shrank to 2.3MB after the VACUUM operation.R. Gosman– R. Gosman2022年02月09日 04:14:45 +00:00Commented Feb 9, 2022 at 4:14
-
You have mentioned, the "autovacuum vacuum" is running frequently. However note that "vacuum" alone does not shrink the data physically, it only removes dead tuples so that the empty space remaining will be reused by new data. "vacuum full" does the shrinking.Sajith P Shetty– Sajith P Shetty2022年02月10日 09:35:46 +00:00Commented Feb 10, 2022 at 9:35
Explore related questions
See similar questions with these tags.