PostgreSQL suddenly started delaying before the first row from a select

Question 1

PostgreSQL 14.1. We read all data from a simple table when our application starts. This used to return the first row very quickly, but now - for no reason I can think of - we have a very noticeable delay before the first row is returned. We noticed this because original application uses Npgsql, which has a default timeout of 20 seconds, and we got timeouts all the time on a large table.

~~(削除) I can reproduce this using psql however, with a query as simple as psql -d mydb -c "select * from event_1"~~

With "EXPLAIN (ANALYZE, BUFFERS)" with track_io_timing = on:

 Seq Scan on event_1 (cost=0.00..318003.14 rows=17601214 width=37) (actual time=0.242..2479.949 rows=17698953 loops=1)
 Buffers: shared read=141991
 I/O Timings: read=1396.307
 Planning:
 Buffers: shared hit=84 read=10
 I/O Timings: read=0.686
 Planning Time: 1.507 ms
 Execution Time: 3022.822 ms

(削除ここまで)

With 17 million rows the delay is about 25 seconds and with >400M rows it's 3.5 minutes. I'm sure this did not happen before, because we read that 400M row table with the default timeout of 20 seconds in Npgsql many times.

How do I figure out why this is happening now?

I did make a schema change to the DB (re-ordering columns) before this happened, but with that change reverted (DB re-created with the old schema) the problem still occurs. It occurs on two separate DB servers, one of which is doing nothing else and has plenty of RAM.

Question 2

Most client libraries will by default read all the rows up front before returning control to the user. There are ways to make it stream the data instead, but you have to go out of your way to do so, and the methods depend on the client. Maybe you were somehow doing that streaming before but then accidentally lost it. That would be a client specific question, so you would have tag it as such, and provide more info than you have.

Question 3

Your 'reproduction' with psql doesn't mean anything, if you can only reproduce the after state and not the before state.

Question 4

To analyze the cause of the bad performance, set track_io_timing = on on the database server, then run EXPLAIN (ANALYZE, BUFFERS) SELECT ... for your query. If you need help interpreting the result, edit the question and add the result there.

Question 5

It's definitely not reading all rows up-front, because for the table of >400M rows that takes much longer than 3.5 minutes (and more RAM than I have) and I'm pretty sure NpgsqlCommand.ExecuteReader() is not supposed to do that. I've edited in the output of EXPLAIN (ANALYZE, BUFFERS) with track_io_timing now.

Question 6

@jjanes was right - when I tried it on the second DB server with Npgsql I didn't get the same delay as with psql, so I should not have tried to "repro" with that. I'm investigating further with Npgsql...

Question 7

OK, so it turns out that the "repro" with pgsql was a red herring, as jjanes said, and psql just seems to always buffer some results before returning them.

When reading with Npgsql I only got this to happen with a few tables and only when ordering by the primary key (as the real application does). Turns out it was not using the index in some cases:

explain select * from behavior.event_1 order by event_id; -- Table where first row is returned quickly
Index Scan using event_60_pkey on event_60 (cost=0.43..273053.87 rows=7960896 width=39)

explain select * from behavior.event_59 order by event_id; -- Table where first row takes 3.5 minutes
Gather Merge (cost=40610567.77..85716663.39 rows=386596684 width=59)
 Workers Planned: 2
 -> Sort (cost=40609567.74..41092813.60 rows=193298342 width=59)
 Sort Key: event_id
 -> Parallel Seq Scan on event_59 (cost=0.00..6738103.42 rows=193298342 width=59)

Reindexing did not help this and clustering on the PK didn't, either. The fix was to run ANALYZE.

EM0 EM0 2504 silver badges15 bronze badges · Answer 1 · 2022-10-06 14:30:16Z

OK, so it turns out that the "repro" with pgsql was a red herring, as jjanes said, and psql just seems to always buffer some results before returning them.

When reading with Npgsql I only got this to happen with a few tables and only when ordering by the primary key (as the real application does). Turns out it was not using the index in some cases:

explain select * from behavior.event_1 order by event_id; -- Table where first row is returned quickly
Index Scan using event_60_pkey on event_60 (cost=0.43..273053.87 rows=7960896 width=39)

explain select * from behavior.event_59 order by event_id; -- Table where first row takes 3.5 minutes
Gather Merge (cost=40610567.77..85716663.39 rows=386596684 width=59)
 Workers Planned: 2
 -> Sort (cost=40609567.74..41092813.60 rows=193298342 width=59)
 Sort Key: event_id
 -> Parallel Seq Scan on event_59 (cost=0.00..6738103.42 rows=193298342 width=59)

Reindexing did not help this and clustering on the PK didn't, either. The fix was to run ANALYZE.

Stack Exchange Network

PostgreSQL suddenly started delaying before the first row from a select

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL suddenly started delaying before the first row from a select

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions