How to avoid conditions in left join and optimize query?

Question 1

I'm having a little problem at work, the ORM is generating a very strange query and I need to optimize its result a little. I wrote a more readable version of the query and the result is similar, but the execution time is still the same.

Original Query
Explain Verbose

This is the current formatted query, it returns the correct value but takes about 13 seconds to execute.

select count(*)
from reservation as r
 JOIN companyclient as cc ON r.companyclientid = cc.companyclientid
 JOIN reservationitem as ri ON r.reservationid = ri.reservationid
 LEFT JOIN billingaccount as bi
 ON (r.reservationid = bi.reservationid AND cc.companyclientid = bi.companyclientid AND
 bi.propertyid = '84'
 AND NOT bi.isdeleted
 AND bi.groupkey = bi.billingaccountid
 AND bi.billingaccounttypeid = '3'
 AND bi.reservationid IS NOT NULL
 AND bi.statusid = '1')
WHERE r.propertyid = '84'
 AND NOT r.isdeleted
 AND r.companyclientid is not null
 AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
 AND NOT ri.isdeleted
 AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8);

Explain Verbose

This is the version I wrote in an attempt to optimize (from 13 seconds to 5), avoiding the conditions inside the left join, but the result is different from the original query. The first query returns 29490 and the second query returns 29397.

select count(*)
from reservation as r
 JOIN companyclient as cc ON r.companyclientid = cc.companyclientid
 JOIN reservationitem as ri ON r.reservationid = ri.reservationid
 LEFT JOIN billingaccount as bi
 ON (r.reservationid = bi.reservationid AND cc.companyclientid = bi.companyclientid)
WHERE r.propertyid = '84'
 AND NOT r.isdeleted
 AND r.companyclientid is not null
 AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
 AND NOT ri.isdeleted
 AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8)
 AND (bi is null or bi.propertyid = '84'
 AND NOT bi.isdeleted
 AND bi.groupkey = bi.billingaccountid
 AND bi.billingaccounttypeid = '3'
 AND bi.reservationid IS NOT NULL
 AND bi.statusid = '1')

Explain Verbose

My question is, how can I optimize the first query, I've tried some methods but I haven't had much success. I understand that the count is linear and its time is based on the size of the query return, but I imagine it's too slow for a 30K rows query.

In this specific case, I need the total of items to calculate the number of pages in the limit offset pagination.

Thanks in advance to everyone who had the patience to read, I accept any help.

Size of tables used in the example:

reservation: 288549 rows
companyclient: 50614 rows
reservationitem: 387820 rows
billingaccount: 772521 rows

Question 2

Occasionally, I've sped things up by grouping the id's in an inner query and then doing the rest of the where clauses etc. in the 'outer query' so it is the same overall query. If indexed properly, this can dramatically improve speed, but it doesn't always work.

Question 3

I suggest you add the CREATE TABLE statements for all 3 tables, including indexes - or the output of \d reservation from psql (again for all 3 tables).

Question 4

Some thoughts about the query. If we had the tables structure and indexes, it would be more helpful (and the advice maybe different):

The 2nd query is not equivalent to the 1st so it's not unusual that you get different results. So let's stick to the first query:

 select count(*)
 from reservation as r
 JOIN companyclient as cc ON r.companyclientid = cc.companyclientid
 JOIN reservationitem as ri ON r.reservationid = ri.reservationid
 LEFT JOIN billingaccount as bi
 ON (r.reservationid = bi.reservationid
 AND cc.companyclientid = bi.companyclientid
 AND bi.propertyid = '84'
 AND NOT bi.isdeleted
 AND bi.groupkey = bi.billingaccountid
 AND bi.billingaccounttypeid = '3'
 AND bi.reservationid IS NOT NULL
 AND bi.statusid = '1')
 WHERE r.propertyid = '84'
 AND NOT r.isdeleted
 AND r.companyclientid is not null
 AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
 AND NOT ri.isdeleted
 AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8);

The conditions r.companyclientid is not null and bi.reservationid IS NOT NULL are redundant since these columns already participate in joining conditions and any NULL value will not pass the joining conditions. So these two can be safely removed (but still check that you get the same results).
The cc.companyclientid = bi.companyclientid condition (inside the subquery) can be replaced with r.companyclientid = bi.companyclientid since you have the equality condition in the outer query: r.companyclientid = cc.companyclientid. This leaves no cc column in the subquery and all the subquery's conditions involve only 2 tables, bi and r and it might help Postgres' planner to get a more simple plan (especially if there are suitable indexes on billingaccount and reservation.
If and only if the reservation. companyclientid is not nullable and there is a FOREIGN KEY constraint that REFERENCES companyclient (companyclientid), then we can completely remove the JOIN companyclient as cc ON r.companyclientid = cc.companyclientid since for any row in reservation, the foreign guarantees that there is a single row in companyclient and the count would not be affected.
use count(1) instead of count(*). This is just a minor improvement in some versions.

If all the above are applied, the query would become
(and don't forget to compare results with the original correct query after every modification!):

 select count(1)
 from reservation as r
 JOIN reservationitem as ri ON r.reservationid = ri.reservationid
 LEFT JOIN billingaccount as bi
 ON (r.reservationid = bi.reservationid
 AND r.companyclientid = bi.companyclientid
 AND bi.propertyid = '84'
 AND NOT bi.isdeleted
 AND bi.groupkey = bi.billingaccountid
 AND bi.billingaccounttypeid = '3'
 AND bi.statusid = '1')
 WHERE r.propertyid = '84'
 AND NOT r.isdeleted
 AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
 AND NOT ri.isdeleted
 AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8);

Now let's suggest some indexes, specifically for this query and one for each table:

CREATE INDEX suggested_index_a ON
reservation
 (propertyid, reservationid, companyclientid)
WHERE (NOT isdeleted) ;
CREATE INDEX suggested_index_b ON
reservationitem
 (tenantid, reservationid) 
INCLUDE (reservationitemstatusid)
WHERE (NOT isdeleted) ;
CREATE INDEX suggested_index_c ON
billingaccount
 (propertyid, billingaccounttypeid, statusid,
 reservationid, companyclientid)
WHERE (groupkey = billingaccountid 
 AND NOT isdeleted) ;

Please try with the above indexes added. Have in mind that any index consumes disk space (and memory space when used) so it is a trade off.

Question 5

Thanks for your amazing answer. The result was slightly larger than expected, it returned 29968 rows. I will check all the points you mentioned.

Question 6

The result returned different because the join with companyclient is necessary, but your query is around 60% faster, now I will check the indexes.

Question 7

so there is no FK :(

Question 8

Using the indices, the query runs in 1.1 seconds! Thanks!

Question 9

While that surely is a oddly written query, using the plan we can see there is a missing index on billingaccount, without having to understand what the design of the overall query is.

You shouldn't need to scan and reject 7649 rows just to return 84, so some of the columns in the filter should be part of the index. We don't know how selective each part of the filter is, but based on the column names and constants, I am guessing you need a multicolumn index over both companyclientid and propertyid. (And should probably have billingaccounttypeid and statusid in it as well, it shouldn't hurt much and might help)

Question 10

Ok, thanks for the reply! I'll have a look if any more indexes can be added.

GabrielNexT – GabrielNexT · Accepted Answer · 2022-09-20 22:18:32Z

Some thoughts about the query. If we had the tables structure and indexes, it would be more helpful (and the advice maybe different):

The 2nd query is not equivalent to the 1st so it's not unusual that you get different results. So let's stick to the first query:

 select count(*)
 from reservation as r
 JOIN companyclient as cc ON r.companyclientid = cc.companyclientid
 JOIN reservationitem as ri ON r.reservationid = ri.reservationid
 LEFT JOIN billingaccount as bi
 ON (r.reservationid = bi.reservationid
 AND cc.companyclientid = bi.companyclientid
 AND bi.propertyid = '84'
 AND NOT bi.isdeleted
 AND bi.groupkey = bi.billingaccountid
 AND bi.billingaccounttypeid = '3'
 AND bi.reservationid IS NOT NULL
 AND bi.statusid = '1')
 WHERE r.propertyid = '84'
 AND NOT r.isdeleted
 AND r.companyclientid is not null
 AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
 AND NOT ri.isdeleted
 AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8);

The conditions r.companyclientid is not null and bi.reservationid IS NOT NULL are redundant since these columns already participate in joining conditions and any NULL value will not pass the joining conditions. So these two can be safely removed (but still check that you get the same results).
The cc.companyclientid = bi.companyclientid condition (inside the subquery) can be replaced with r.companyclientid = bi.companyclientid since you have the equality condition in the outer query: r.companyclientid = cc.companyclientid. This leaves no cc column in the subquery and all the subquery's conditions involve only 2 tables, bi and r and it might help Postgres' planner to get a more simple plan (especially if there are suitable indexes on billingaccount and reservation.
If and only if the reservation. companyclientid is not nullable and there is a FOREIGN KEY constraint that REFERENCES companyclient (companyclientid), then we can completely remove the JOIN companyclient as cc ON r.companyclientid = cc.companyclientid since for any row in reservation, the foreign guarantees that there is a single row in companyclient and the count would not be affected.
use count(1) instead of count(*). This is just a minor improvement in some versions.

If all the above are applied, the query would become
(and don't forget to compare results with the original correct query after every modification!):

 select count(1)
 from reservation as r
 JOIN reservationitem as ri ON r.reservationid = ri.reservationid
 LEFT JOIN billingaccount as bi
 ON (r.reservationid = bi.reservationid
 AND r.companyclientid = bi.companyclientid
 AND bi.propertyid = '84'
 AND NOT bi.isdeleted
 AND bi.groupkey = bi.billingaccountid
 AND bi.billingaccounttypeid = '3'
 AND bi.statusid = '1')
 WHERE r.propertyid = '84'
 AND NOT r.isdeleted
 AND ri.tenantid = '025aa64f-67fb-4c23-b975-2b0fc3f5d65a'
 AND NOT ri.isdeleted
 AND ri.reservationitemstatusid NOT IN (6, 3, 7, 8);

Now let's suggest some indexes, specifically for this query and one for each table:

CREATE INDEX suggested_index_a ON
reservation
 (propertyid, reservationid, companyclientid)
WHERE (NOT isdeleted) ;
CREATE INDEX suggested_index_b ON
reservationitem
 (tenantid, reservationid) 
INCLUDE (reservationitemstatusid)
WHERE (NOT isdeleted) ;
CREATE INDEX suggested_index_c ON
billingaccount
 (propertyid, billingaccounttypeid, statusid,
 reservationid, companyclientid)
WHERE (groupkey = billingaccountid 
 AND NOT isdeleted) ;

Please try with the above indexes added. Have in mind that any index consumes disk space (and memory space when used) so it is a trade off.

Thanks for your amazing answer. The result was slightly larger than expected, it returned 29968 rows. I will check all the points you mentioned.
The result returned different because the join with companyclient is necessary, but your query is around 60% faster, now I will check the indexes.

Stack Exchange Network

How to avoid conditions in left join and optimize query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to avoid conditions in left join and optimize query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions