I'm having a little problem at work, the ORM is generating a very strange query and I need to optimize its result a little. I wrote a more readable version of the query and the result is similar, but the execution time is still the same.
Original Query
Explain Verbose
This is the current formatted query, it returns the correct value but takes about 13 seconds to execute.
select count(*)
from reservation as r
JOIN companyclient as cc ON r.companyclientid = cc.companyclientid
JOIN reservationitem as ri ON r.reservationid = ri.reservationid
LEFT JOIN billingaccount as bi
ON (r.reservationid = bi.reservationid AND cc.companyclientid = bi.companyclientid AND
bi.propertyid = '84'
AND NOT bi.isdeleted
AND bi.groupkey = bi.billingaccountid
AND bi.billingaccounttypeid = '3'
AND bi.reservationid IS NOT NULL
AND bi.statusid = '1')
WHERE r.propertyid = '84'
AND NOT r.isdeleted
AND r.companyclientid is not null
AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
AND NOT ri.isdeleted
AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8);
This is the version I wrote in an attempt to optimize (from 13 seconds to 5), avoiding the conditions inside the left join, but the result is different from the original query. The first query returns 29490 and the second query returns 29397.
select count(*)
from reservation as r
JOIN companyclient as cc ON r.companyclientid = cc.companyclientid
JOIN reservationitem as ri ON r.reservationid = ri.reservationid
LEFT JOIN billingaccount as bi
ON (r.reservationid = bi.reservationid AND cc.companyclientid = bi.companyclientid)
WHERE r.propertyid = '84'
AND NOT r.isdeleted
AND r.companyclientid is not null
AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
AND NOT ri.isdeleted
AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8)
AND (bi is null or bi.propertyid = '84'
AND NOT bi.isdeleted
AND bi.groupkey = bi.billingaccountid
AND bi.billingaccounttypeid = '3'
AND bi.reservationid IS NOT NULL
AND bi.statusid = '1')
My question is, how can I optimize the first query, I've tried some methods but I haven't had much success. I understand that the count is linear and its time is based on the size of the query return, but I imagine it's too slow for a 30K rows query.
In this specific case, I need the total of items to calculate the number of pages in the limit offset pagination.
Thanks in advance to everyone who had the patience to read, I accept any help.
Size of tables used in the example:
- reservation: 288549 rows
- companyclient: 50614 rows
- reservationitem: 387820 rows
- billingaccount: 772521 rows
2 Answers 2
Some thoughts about the query. If we had the tables structure and indexes, it would be more helpful (and the advice maybe different):
The 2nd query is not equivalent to the 1st so it's not unusual that you get different results. So let's stick to the first query:
select count(*) from reservation as r JOIN companyclient as cc ON r.companyclientid = cc.companyclientid JOIN reservationitem as ri ON r.reservationid = ri.reservationid LEFT JOIN billingaccount as bi ON (r.reservationid = bi.reservationid AND cc.companyclientid = bi.companyclientid AND bi.propertyid = '84' AND NOT bi.isdeleted AND bi.groupkey = bi.billingaccountid AND bi.billingaccounttypeid = '3' AND bi.reservationid IS NOT NULL AND bi.statusid = '1') WHERE r.propertyid = '84' AND NOT r.isdeleted AND r.companyclientid is not null AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a' AND NOT ri.isdeleted AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8);
The conditions
r.companyclientid is not null
andbi.reservationid IS NOT NULL
are redundant since these columns already participate in joining conditions and any NULL value will not pass the joining conditions. So these two can be safely removed (but still check that you get the same results).The
cc.companyclientid = bi.companyclientid
condition (inside the subquery) can be replaced withr.companyclientid = bi.companyclientid
since you have the equality condition in the outer query:r.companyclientid = cc.companyclientid
. This leaves nocc
column in the subquery and all the subquery's conditions involve only 2 tables,bi
andr
and it might help Postgres' planner to get a more simple plan (especially if there are suitable indexes onbillingaccount
andreservation
.If and only if the
reservation. companyclientid
is not nullable and there is aFOREIGN KEY
constraint thatREFERENCES companyclient (companyclientid)
, then we can completely remove theJOIN companyclient as cc ON r.companyclientid = cc.companyclientid
since for any row inreservation
, the foreign guarantees that there is a single row incompanyclient
and the count would not be affected.use
count(1)
instead ofcount(*)
. This is just a minor improvement in some versions.
If all the above are applied, the query would become
(and don't forget to compare results with the original correct query after every modification!):
select count(1)
from reservation as r
JOIN reservationitem as ri ON r.reservationid = ri.reservationid
LEFT JOIN billingaccount as bi
ON (r.reservationid = bi.reservationid
AND r.companyclientid = bi.companyclientid
AND bi.propertyid = '84'
AND NOT bi.isdeleted
AND bi.groupkey = bi.billingaccountid
AND bi.billingaccounttypeid = '3'
AND bi.statusid = '1')
WHERE r.propertyid = '84'
AND NOT r.isdeleted
AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
AND NOT ri.isdeleted
AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8);
Now let's suggest some indexes, specifically for this query and one for each table:
CREATE INDEX suggested_index_a ON
reservation
(propertyid, reservationid, companyclientid)
WHERE (NOT isdeleted) ;
CREATE INDEX suggested_index_b ON
reservationitem
(tenantid, reservationid)
INCLUDE (reservationitemstatusid)
WHERE (NOT isdeleted) ;
CREATE INDEX suggested_index_c ON
billingaccount
(propertyid, billingaccounttypeid, statusid,
reservationid, companyclientid)
WHERE (groupkey = billingaccountid
AND NOT isdeleted) ;
Please try with the above indexes added. Have in mind that any index consumes disk space (and memory space when used) so it is a trade off.
-
Thanks for your amazing answer. The result was slightly larger than expected, it returned 29968 rows. I will check all the points you mentioned.GabrielNexT– GabrielNexT2022年09月21日 15:53:18 +00:00Commented Sep 21, 2022 at 15:53
-
The result returned different because the join with companyclient is necessary, but your query is around 60% faster, now I will check the indexes.GabrielNexT– GabrielNexT2022年09月21日 16:38:42 +00:00Commented Sep 21, 2022 at 16:38
-
so there is no FK :(ypercubeᵀᴹ– ypercubeᵀᴹ2022年09月21日 17:22:20 +00:00Commented Sep 21, 2022 at 17:22
-
Using the indices, the query runs in 1.1 seconds! Thanks!GabrielNexT– GabrielNexT2022年09月22日 16:25:53 +00:00Commented Sep 22, 2022 at 16:25
While that surely is a oddly written query, using the plan we can see there is a missing index on billingaccount, without having to understand what the design of the overall query is.
You shouldn't need to scan and reject 7649 rows just to return 84, so some of the columns in the filter should be part of the index. We don't know how selective each part of the filter is, but based on the column names and constants, I am guessing you need a multicolumn index over both companyclientid and propertyid. (And should probably have billingaccounttypeid and statusid in it as well, it shouldn't hurt much and might help)
-
Ok, thanks for the reply! I'll have a look if any more indexes can be added.GabrielNexT– GabrielNexT2022年09月21日 15:48:11 +00:00Commented Sep 21, 2022 at 15:48
CREATE TABLE
statements for all 3 tables, including indexes - or the output of\d reservation
from psql (again for all 3 tables).