PostgreSQL 14.1. We read all data from a simple table when our application starts. This used to return the first row very quickly, but now - for no reason I can think of - we have a very noticeable delay before the first row is returned. We noticed this because original application uses Npgsql, which has a default timeout of 20 seconds, and we got timeouts all the time on a large table.
(削除) I can reproduce this using psql however, with a query as simple as
psql -d mydb -c "select * from event_1"
With "EXPLAIN (ANALYZE, BUFFERS)" with track_io_timing = on:
Seq Scan on event_1 (cost=0.00..318003.14 rows=17601214 width=37) (actual time=0.242..2479.949 rows=17698953 loops=1)
Buffers: shared read=141991
I/O Timings: read=1396.307
Planning:
Buffers: shared hit=84 read=10
I/O Timings: read=0.686
Planning Time: 1.507 ms
Execution Time: 3022.822 ms
(削除ここまで)
With 17 million rows the delay is about 25 seconds and with >400M rows it's 3.5 minutes. I'm sure this did not happen before, because we read that 400M row table with the default timeout of 20 seconds in Npgsql many times.
How do I figure out why this is happening now?
I did make a schema change to the DB (re-ordering columns) before this happened, but with that change reverted (DB re-created with the old schema) the problem still occurs. It occurs on two separate DB servers, one of which is doing nothing else and has plenty of RAM.
1 Answer 1
OK, so it turns out that the "repro" with pgsql was a red herring, as jjanes said, and psql just seems to always buffer some results before returning them.
When reading with Npgsql I only got this to happen with a few tables and only when ordering by the primary key (as the real application does). Turns out it was not using the index in some cases:
explain select * from behavior.event_1 order by event_id; -- Table where first row is returned quickly
Index Scan using event_60_pkey on event_60 (cost=0.43..273053.87 rows=7960896 width=39)
explain select * from behavior.event_59 order by event_id; -- Table where first row takes 3.5 minutes
Gather Merge (cost=40610567.77..85716663.39 rows=386596684 width=59)
Workers Planned: 2
-> Sort (cost=40609567.74..41092813.60 rows=193298342 width=59)
Sort Key: event_id
-> Parallel Seq Scan on event_59 (cost=0.00..6738103.42 rows=193298342 width=59)
Reindexing did not help this and clustering on the PK didn't, either. The fix was to run ANALYZE
.
track_io_timing = on
on the database server, then runEXPLAIN (ANALYZE, BUFFERS) SELECT ...
for your query. If you need help interpreting the result, edit the question and add the result there.NpgsqlCommand.ExecuteReader()
is not supposed to do that. I've edited in the output ofEXPLAIN (ANALYZE, BUFFERS)
with track_io_timing now.