I have a production application that utilizes postgres. A common pattern we use is to:
A) Formulate the query necessary to select the data with a given arbitrary set of filters that the user selects from the UX layer. These queries are usually very complex and heavily optimized to be fast. For the sake of this example lets call it SELECT * FROM real_query WHERE complicated = true
B) The application then takes this core query and dispatches 2 separate queries based on the core query. The first simply appends a reasonable LIMIT/OFFSET to only fetch a page worth of records. The second will wrap the original query in SELECT count(*) FROM ( <<ORIGINAL QUERY>>)t
in order to get to total count of records (without the limit/offset) which is necessary for the paginated UX.
With that out of the way here is my real question. The original query is very fast for it's complexity. Roughly 400ms. However as soon as we wrap it with the count operation it slows down to 20 seconds. It's as if the subquery count version ignores all the optimization and indexes we used to make the core query fast.
So why is SELECT * FROM real_query WHERE complicated = true
fast, but SELECT count(*) FROM (SELECT * FROM real_query WHERE complicated = true)t
so slow?
What can I do to make the counting query faster?
1 Answer 1
You can confirm this by looking at your execution plans, but this is a fairly normal problem with a very predictable reason.
Your paginated query only needs to do the work to return the first set of rows. Your count query needs to count every row that matches your filters. If some of your filters require table row access then you are now visiting every row that matches the filter rather than just enough to populate a page. This can be much slower.
If you know your indexes and are happy making predictions about your data then you can run different queries to return the counts only using indexed columns in your filters, and then scale that back depending on how you expect the other filters to work.
Otherwise, decide whether you really need to get the full count of rows. Do people care if there’s 45,121 rows or do they just care there’s another page of results?