PostgreSQL - If I run multiple queries concurrently, under what circumstances would I see a speedup? Under what circumstances would I see a slowdown?

Question 1

I approach you all humbly as one who is NOT a DBA, and I'm sure that my question is fraught with conceptual shortcomings and "it depends on" land mines. I'm also pretty sure that all of you who choose to answer are going to want a lot more in the way of specifics than I can currently deliver.

That said, I'm curious about the following scenario in general:

Say that I have two non-trivial queries.
Query 1 requires 2 minutes to complete on average.
Query 2 requires 5 minutes to complete on average.

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

Under what conditions would I expect to see a speedup? (Total time < 7 minutes)
Under what conditions would I expect to see a slowdown? (Total time> 7 minutes)

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Again, I know it's probably impossible to answer the question precisely without knowing specifics (which I don't have.) I'm looking for some general guidelines to think about when asking the following questions:

Under what circumstances do concurrent queries result in an overall speedup?
Under what circumstances do concurrent queries result in an overall slowdown?

Question 2

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

If they use unrelated data sets, then yes.

If they share a data set, and the cache is cold for the first query and the query is mostly I/O bound, then the second one might complete in moments. You need to consider caching effects when dealing with performance analysis and query timing.

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

"It depends".

If they were both using sequential scans of the same table then in PostgreSQL it'd be a huge performance win because of its support for synchronized sequential scans.

If they shared the same indexes then they'd likely benefit from each others' reads in to cache.

If they're independent and touch different data then they might compete for I/O bandwidth, in which case they might take the same amount of time as running sequentially. If the I/O subsystem benefits from concurrency (higher net throughput with more clients) then the total time might be less. If the I/O subsystem handles concurrency poorly then they might take longer than running them sequentially. Or they might not be I/O bound at all, in which case if there's a free CPU for each they could well execute as if the other wasn't running at all.

It depends a great deal on the hardware and system configuration, the data set, and on the queries themselves.

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Yes, that'd very likely slow things down for a number of reasons.

PostgreSQL's own overheads in inter-process coordination, transaction and lock management, buffer management, etc. This can be quite a big cost, and PostgreSQL isn't really designed for high client counts - it works better if you queue work.
Competition for working memory, cache, etc.
OS scheduling overhead as it juggles 1000 competing processes all wanting time slices. Pretty minor these days, modern OSes have fast schedulers.
I/O thrashing. Most I/O systems have a peak performance client count. Sometimes it's 1, i.e. it's best with only one client, but it's often higher. Sometimes performance decreases again above the threshold. Sometimes it just reaches a plateau.

Question 3

This is exactly the kind of explanation I was looking for. Clear, succinct, informative. Thanks!

Question 4

Hi @Craig Ringer, What if I will run 1000 queries concurrently on a single table (200 million rows). Will Postgres handle them nicely? Do synchronized sequential scans helps?

Question 5

@RahulGautam New question with details please, with a link back to this one.

Question 6

@CraigRinger added. Please check dba.stackexchange.com/questions/188649/…

Question 7

@RahulGautam Your link is dead. I wonder if you could provide an update on what happened? It is a very interesting topic.

Craig Ringer Craig Ringer 57.9k6 gold badges162 silver badges193 bronze badges · Accepted Answer · 2014-07-25 01:21:27Z

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

If they use unrelated data sets, then yes.

If they share a data set, and the cache is cold for the first query and the query is mostly I/O bound, then the second one might complete in moments. You need to consider caching effects when dealing with performance analysis and query timing.

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

"It depends".

If they were both using sequential scans of the same table then in PostgreSQL it'd be a huge performance win because of its support for synchronized sequential scans.

If they shared the same indexes then they'd likely benefit from each others' reads in to cache.

If they're independent and touch different data then they might compete for I/O bandwidth, in which case they might take the same amount of time as running sequentially. If the I/O subsystem benefits from concurrency (higher net throughput with more clients) then the total time might be less. If the I/O subsystem handles concurrency poorly then they might take longer than running them sequentially. Or they might not be I/O bound at all, in which case if there's a free CPU for each they could well execute as if the other wasn't running at all.

It depends a great deal on the hardware and system configuration, the data set, and on the queries themselves.

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Yes, that'd very likely slow things down for a number of reasons.

PostgreSQL's own overheads in inter-process coordination, transaction and lock management, buffer management, etc. This can be quite a big cost, and PostgreSQL isn't really designed for high client counts - it works better if you queue work.
Competition for working memory, cache, etc.
OS scheduling overhead as it juggles 1000 competing processes all wanting time slices. Pretty minor these days, modern OSes have fast schedulers.
I/O thrashing. Most I/O systems have a peak performance client count. Sometimes it's 1, i.e. it's best with only one client, but it's often higher. Sometimes performance decreases again above the threshold. Sometimes it just reaches a plateau.

This is exactly the kind of explanation I was looking for. Clear, succinct, informative. Thanks!
Hi @Craig Ringer, What if I will run 1000 queries concurrently on a single table (200 million rows). Will Postgres handle them nicely? Do synchronized sequential scans helps?
@RahulGautam New question with details please, with a link back to this one.
@CraigRinger added. Please check dba.stackexchange.com/questions/188649/…
@RahulGautam Your link is dead. I wonder if you could provide an update on what happened? It is a very interesting topic.

Stack Exchange Network

PostgreSQL - If I run multiple queries concurrently, under what circumstances would I see a speedup? Under what circumstances would I see a slowdown?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL - If I run multiple queries concurrently, under what circumstances would I see a speedup? Under what circumstances would I see a slowdown?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions