23

I approach you all humbly as one who is NOT a DBA, and I'm sure that my question is fraught with conceptual shortcomings and "it depends on" land mines. I'm also pretty sure that all of you who choose to answer are going to want a lot more in the way of specifics than I can currently deliver.

That said, I'm curious about the following scenario in general:

  • Say that I have two non-trivial queries.
  • Query 1 requires 2 minutes to complete on average.
  • Query 2 requires 5 minutes to complete on average.

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

  • Under what conditions would I expect to see a speedup? (Total time < 7 minutes)
  • Under what conditions would I expect to see a slowdown? (Total time> 7 minutes)

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Again, I know it's probably impossible to answer the question precisely without knowing specifics (which I don't have.) I'm looking for some general guidelines to think about when asking the following questions:

  • Under what circumstances do concurrent queries result in an overall speedup?
  • Under what circumstances do concurrent queries result in an overall slowdown?
asked Jul 24, 2014 at 21:10

1 Answer 1

21

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

If they use unrelated data sets, then yes.

If they share a data set, and the cache is cold for the first query and the query is mostly I/O bound, then the second one might complete in moments. You need to consider caching effects when dealing with performance analysis and query timing.

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

"It depends".

If they were both using sequential scans of the same table then in PostgreSQL it'd be a huge performance win because of its support for synchronized sequential scans.

If they shared the same indexes then they'd likely benefit from each others' reads in to cache.

If they're independent and touch different data then they might compete for I/O bandwidth, in which case they might take the same amount of time as running sequentially. If the I/O subsystem benefits from concurrency (higher net throughput with more clients) then the total time might be less. If the I/O subsystem handles concurrency poorly then they might take longer than running them sequentially. Or they might not be I/O bound at all, in which case if there's a free CPU for each they could well execute as if the other wasn't running at all.

It depends a great deal on the hardware and system configuration, the data set, and on the queries themselves.

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Yes, that'd very likely slow things down for a number of reasons.

  • PostgreSQL's own overheads in inter-process coordination, transaction and lock management, buffer management, etc. This can be quite a big cost, and PostgreSQL isn't really designed for high client counts - it works better if you queue work.

  • Competition for working memory, cache, etc.

  • OS scheduling overhead as it juggles 1000 competing processes all wanting time slices. Pretty minor these days, modern OSes have fast schedulers.

  • I/O thrashing. Most I/O systems have a peak performance client count. Sometimes it's 1, i.e. it's best with only one client, but it's often higher. Sometimes performance decreases again above the threshold. Sometimes it just reaches a plateau.

answered Jul 25, 2014 at 1:21
5
  • This is exactly the kind of explanation I was looking for. Clear, succinct, informative. Thanks! Commented Jul 25, 2014 at 14:38
  • Hi @Craig Ringer, What if I will run 1000 queries concurrently on a single table (200 million rows). Will Postgres handle them nicely? Do synchronized sequential scans helps? Commented Oct 16, 2017 at 17:17
  • @RahulGautam New question with details please, with a link back to this one. Commented Oct 17, 2017 at 0:44
  • @CraigRinger added. Please check dba.stackexchange.com/questions/188649/… Commented Oct 17, 2017 at 11:32
  • @RahulGautam Your link is dead. I wonder if you could provide an update on what happened? It is a very interesting topic. Commented Feb 18, 2020 at 17:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.