How to optimize a query with timestamp that using OR clause?

Question 1

I have the follow table:

create table myTable
(
 id serial not null
 rowtype integer not null
 starttime timestamp not null,
 endtime timestamp,
 clientid integer not null
 deletedat timestamp default '1980-01-01 00:00:00'::timestamp without time zone not null,
 constraint myTable_pkey primary key,
 constraint myTable_rowtype_fkey references rowtype (number),
 constraint myTable_clientid_fkey references client,
 constraint myTable_rowtype_starttime_endtime_clientid_deletedat_key unique (rowtype, starttime, endtime, clientid, deletedat)
);

And I need to run the following query:

select myTable.id as id, client.id as clientid, myTable.rowtype as rowtype, myTable.starttime as starttime, myTable.endtime as endtime
from myTable
 inner join client on myTable.clientid = client.id 
where myTable.rowtype in (1, 2, 3, 4)
 and ((myTable.starttime>'2021-03-04 19:33:26+00' and myTable.starttime<'2021-03-05 00:40:28+00')
 or (myTable.endtime>'2021-03-04 19:33:26+00' and myTable.endtime<'2021-03-05 00:40:28+00'))
 and myTable.deletedat<'2000-01-01 00:00:00'
 and client.deletedat<'2000-01-01 00:00:00';

Explain Analyse:

Nested Loop (cost=0.84..15782.99 rows=1 width=28) (actual time=246.485..557.682 rows=2 loops=1)
 -> Index Scan using client_unique_otherFk on client (cost=0.28..8.30 rows=1 width=4) (actual time=0.047..0.052 rows=1 loops=1)
 Index Cond: ((otherFk = 10) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
 -> Index Scan using myTable_rowtype_starttime_endtime_clientid_deletedat_key on myTable (cost=0.56..15774.63 rows=6 width=28) (actual time=246.460..557.643 rows=2 loops=1)
 Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
 Filter: (((starttime > '2021-03-04 19:33:26'::timestamp without time zone) AND (starttime < '2021-03-05 00:40:28'::timestamp without time zone)) OR ((endtime > '2021-03-04 19:33:26'::timestamp without time zone) AND (endtime < '2021-03-05 00:40:28'::timestamp without time zone)))
 Rows Removed by Filter: 9423

Through explain analyze I saw that the clause for myTable.starttime and myTable.endtime do not use myTable_rowtype_starttime_endtime_clientid_deletedat_key index. I think it is OR clause, right?

My first question is, have one method that I can query use index using OR clause in this case?

However, I tested query using only starttime or only endtime in Where clause and remove OR. In two cases both used index myTable_rowtype_starttime_endtime_clientid_deletedat_key, but when I execute only with endtime the execution time of query it's much slower than I use only starttime, even the two using the same index.

Query only use endtime example:

select myTable.id as id, client.id as clientid, myTable.rowtype as rowtype, myTable.starttime as starttime, myTable.endtime as endtime
from myTable
 inner join client on myTable.clientid = client.id 
where myTable.rowtype in (1, 2, 3, 4)
 and (myTable.endtime>'2021-03-04 19:33:26+00' and myTable.endtime<'2021-03-05 00:40:28+00')
 and myTable.deletedat<'2000-01-01 00:00:00'
 and client.deletedat<'2000-01-01 00:00:00';

Explain Analyse:

Nested Loop (cost=0.84..15232.04 rows=1 width=28) (actual time=276.628..568.299 rows=1 loops=1)
 -> Index Scan using client_unique_otherFk on client (cost=0.28..8.30 rows=1 width=4) (actual time=0.047..0.052 rows=1 loops=1)
 Index Cond: ((otherFk = 10) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
 -> Index Scan using myTable_rowtype_starttime_endtime_clientid_deletedat_key on myTable (cost=0.56..15223.71 rows=3 width=28) (actual time=276.573..568.233 rows=1 loops=1)
 Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (endtime > '2021-03-04 19:33:26'::timestamp without time zone) AND (endtime < '2021-03-05 00:40:28'::timestamp without time zone) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
Planning time: 0.453 ms
Execution time: 568.409 ms

Using only starttime instead endtime, the execution time is on average 0.449 ms.

My second question is about this time difference for starttime and endtime fields, both using the same index. Does this time difference have any relation with the endtime isn't not null but starttime is, or not? And if not, what could it be?
What could I do, to optimization this query? (using starttime and endtime with OR like the first example query).

Question 2

Usually it is most efficient to avoid the dreaded OR and replace it with a UNION:

 SELECT myTable.id, client.id, ...
 FROM myTable
 JOIN client ON myTable.clientid = client.id 
 WHERE myTable.rowtype in (1, 2, 3, 4)
 AND myTable.starttime > '2021-03-04 19:33:26+00'
 AND myTable.starttime < '2021-03-05 00:40:28+00'
 AND myTable.deletedat < '2000-01-01 00:00:00'
 AND client.deletedat < '2000-01-01 00:00:00';
UNION
 SELECT myTable.id, client.id, ...
 FROM myTable
 JOIN client ON myTable.clientid = client.id 
 WHERE myTable.rowtype in (1, 2, 3, 4)
 AND myTable.endtime > '2021-03-04 19:33:26+00'
 AND myTable.endtime < '2021-03-05 00:40:28+00'
 AND myTable.deletedat < '2000-01-01 00:00:00'
 AND client.deletedat < '2000-01-01 00:00:00';

This will produce the same result if you have both primary keys in the SELECT list. If you can live with duplicates, you could use the more efficient UNION ALL.

I am not sure which conditions are the selective ones, but you could start with two indexes ON mytable (starttime) and ON mytable (endtime).

Question 3

Oh, nice article, I didn't know about it. It's make sense create new two indexes only on starttime and on endtime if I already have the index myTable_rowtype_starttime_endtime_clientid_deletedat_key? Use index only on starttime or endtime is more efficient?

Question 4

Yes. An index on two columns is quite different from two indexes, one on each column.

Question 5

I create two index with (rowType, starttime, deletedat) and (rowType, endtime, deletedat) and deleted the existing one, and absurdly improve query performance. But I didn't see any significant improvement in using UNION instead OR after I change the indexes in this case.

Question 6

Then you nay be seeing a BitmapOr, which is how PostgreSQL can handle OR.

Question 7

Index:

(rowtype, starttime, endtime, clientid, deletedat)

Index usage:

Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))

Notice how the index condition skips over starttime (as it not usable, being specified in the OR condition). So while it is comparing clientid and deletedat from the index against specified values, it just using them as an "in-index filter", it can't jump to a specific part of the index where it knows the qualifying values will be. This is quite inefficient, at compared to being to jump to a specific part of the index.

So one thing you could try is simply re-writing the column order in the index.

(rowtype, clientid, deletedat, starttime, endtime)

Using starttime in the OR makes it not useful for use in the index. And having it early in the index makes the rest of the columns, occurring after it in the index, also not useful (well, not nearly as useful as they could be). So that is a double whammy.

Question 8

Thanks jjanes for contribuition. Your aswner it was important for my understanding solution. But I have the correct answer for Laurenz for the article explain ugly OR use cases and hint for separate on two indexes.

Laurenz Albe Laurenz Albe 61.9k4 gold badges57 silver badges93 bronze badges · Accepted Answer · 2021-03-23 21:18:39Z

Usually it is most efficient to avoid the dreaded OR and replace it with a UNION:

 SELECT myTable.id, client.id, ...
 FROM myTable
 JOIN client ON myTable.clientid = client.id 
 WHERE myTable.rowtype in (1, 2, 3, 4)
 AND myTable.starttime > '2021-03-04 19:33:26+00'
 AND myTable.starttime < '2021-03-05 00:40:28+00'
 AND myTable.deletedat < '2000-01-01 00:00:00'
 AND client.deletedat < '2000-01-01 00:00:00';
UNION
 SELECT myTable.id, client.id, ...
 FROM myTable
 JOIN client ON myTable.clientid = client.id 
 WHERE myTable.rowtype in (1, 2, 3, 4)
 AND myTable.endtime > '2021-03-04 19:33:26+00'
 AND myTable.endtime < '2021-03-05 00:40:28+00'
 AND myTable.deletedat < '2000-01-01 00:00:00'
 AND client.deletedat < '2000-01-01 00:00:00';

This will produce the same result if you have both primary keys in the SELECT list. If you can live with duplicates, you could use the more efficient UNION ALL.

I am not sure which conditions are the selective ones, but you could start with two indexes ON mytable (starttime) and ON mytable (endtime).

Oh, nice article, I didn't know about it. It's make sense create new two indexes only on starttime and on endtime if I already have the index myTable_rowtype_starttime_endtime_clientid_deletedat_key? Use index only on starttime or endtime is more efficient?
Yes. An index on two columns is quite different from two indexes, one on each column.
I create two index with (rowType, starttime, deletedat) and (rowType, endtime, deletedat) and deleted the existing one, and absurdly improve query performance. But I didn't see any significant improvement in using UNION instead OR after I change the indexes in this case.
Then you nay be seeing a BitmapOr, which is how PostgreSQL can handle OR.

Stack Exchange Network

How to optimize a query with timestamp that using OR clause?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to optimize a query with timestamp that using OR clause?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions