I have the follow table:
create table myTable
(
id serial not null
rowtype integer not null
starttime timestamp not null,
endtime timestamp,
clientid integer not null
deletedat timestamp default '1980-01-01 00:00:00'::timestamp without time zone not null,
constraint myTable_pkey primary key,
constraint myTable_rowtype_fkey references rowtype (number),
constraint myTable_clientid_fkey references client,
constraint myTable_rowtype_starttime_endtime_clientid_deletedat_key unique (rowtype, starttime, endtime, clientid, deletedat)
);
And I need to run the following query:
select myTable.id as id, client.id as clientid, myTable.rowtype as rowtype, myTable.starttime as starttime, myTable.endtime as endtime
from myTable
inner join client on myTable.clientid = client.id
where myTable.rowtype in (1, 2, 3, 4)
and ((myTable.starttime>'2021-03-04 19:33:26+00' and myTable.starttime<'2021-03-05 00:40:28+00')
or (myTable.endtime>'2021-03-04 19:33:26+00' and myTable.endtime<'2021-03-05 00:40:28+00'))
and myTable.deletedat<'2000-01-01 00:00:00'
and client.deletedat<'2000-01-01 00:00:00';
Explain Analyse:
Nested Loop (cost=0.84..15782.99 rows=1 width=28) (actual time=246.485..557.682 rows=2 loops=1)
-> Index Scan using client_unique_otherFk on client (cost=0.28..8.30 rows=1 width=4) (actual time=0.047..0.052 rows=1 loops=1)
Index Cond: ((otherFk = 10) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
-> Index Scan using myTable_rowtype_starttime_endtime_clientid_deletedat_key on myTable (cost=0.56..15774.63 rows=6 width=28) (actual time=246.460..557.643 rows=2 loops=1)
Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
Filter: (((starttime > '2021-03-04 19:33:26'::timestamp without time zone) AND (starttime < '2021-03-05 00:40:28'::timestamp without time zone)) OR ((endtime > '2021-03-04 19:33:26'::timestamp without time zone) AND (endtime < '2021-03-05 00:40:28'::timestamp without time zone)))
Rows Removed by Filter: 9423
Through explain analyze I saw that the clause for myTable.starttime
and myTable.endtime
do not use myTable_rowtype_starttime_endtime_clientid_deletedat_key
index. I think it is OR clause, right?
- My first question is, have one method that I can query use index using OR clause in this case?
However, I tested query using only starttime
or only endtime
in Where clause and remove OR. In two cases both used index myTable_rowtype_starttime_endtime_clientid_deletedat_key
, but when I execute only with endtime
the execution time of query it's much slower than I use only starttime, even the two using the same index.
Query only use endtime example:
select myTable.id as id, client.id as clientid, myTable.rowtype as rowtype, myTable.starttime as starttime, myTable.endtime as endtime
from myTable
inner join client on myTable.clientid = client.id
where myTable.rowtype in (1, 2, 3, 4)
and (myTable.endtime>'2021-03-04 19:33:26+00' and myTable.endtime<'2021-03-05 00:40:28+00')
and myTable.deletedat<'2000-01-01 00:00:00'
and client.deletedat<'2000-01-01 00:00:00';
Explain Analyse:
Nested Loop (cost=0.84..15232.04 rows=1 width=28) (actual time=276.628..568.299 rows=1 loops=1)
-> Index Scan using client_unique_otherFk on client (cost=0.28..8.30 rows=1 width=4) (actual time=0.047..0.052 rows=1 loops=1)
Index Cond: ((otherFk = 10) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
-> Index Scan using myTable_rowtype_starttime_endtime_clientid_deletedat_key on myTable (cost=0.56..15223.71 rows=3 width=28) (actual time=276.573..568.233 rows=1 loops=1)
Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (endtime > '2021-03-04 19:33:26'::timestamp without time zone) AND (endtime < '2021-03-05 00:40:28'::timestamp without time zone) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
Planning time: 0.453 ms
Execution time: 568.409 ms
Using only starttime instead endtime, the execution time is on average 0.449 ms.
My second question is about this time difference for
starttime
andendtime
fields, both using the same index. Does this time difference have any relation with the endtime isn't not null but starttime is, or not? And if not, what could it be?What could I do, to optimization this query? (using starttime and endtime with OR like the first example query).
2 Answers 2
Usually it is most efficient to avoid the dreaded OR
and replace it with a UNION
:
SELECT myTable.id, client.id, ...
FROM myTable
JOIN client ON myTable.clientid = client.id
WHERE myTable.rowtype in (1, 2, 3, 4)
AND myTable.starttime > '2021-03-04 19:33:26+00'
AND myTable.starttime < '2021-03-05 00:40:28+00'
AND myTable.deletedat < '2000-01-01 00:00:00'
AND client.deletedat < '2000-01-01 00:00:00';
UNION
SELECT myTable.id, client.id, ...
FROM myTable
JOIN client ON myTable.clientid = client.id
WHERE myTable.rowtype in (1, 2, 3, 4)
AND myTable.endtime > '2021-03-04 19:33:26+00'
AND myTable.endtime < '2021-03-05 00:40:28+00'
AND myTable.deletedat < '2000-01-01 00:00:00'
AND client.deletedat < '2000-01-01 00:00:00';
This will produce the same result if you have both primary keys in the SELECT
list. If you can live with duplicates, you could use the more efficient UNION ALL
.
I am not sure which conditions are the selective ones, but you could start with two indexes ON mytable (starttime)
and ON mytable (endtime)
.
-
Oh, nice article, I didn't know about it. It's make sense create new two indexes only on starttime and on endtime if I already have the index
myTable_rowtype_starttime_endtime_clientid_deletedat_key
? Use index only on starttime or endtime is more efficient?Vinicius Gabriel– Vinicius Gabriel2021年03月24日 01:06:43 +00:00Commented Mar 24, 2021 at 1:06 -
2Yes. An index on two columns is quite different from two indexes, one on each column.Laurenz Albe– Laurenz Albe2021年03月24日 04:14:44 +00:00Commented Mar 24, 2021 at 4:14
-
I create two index with
(rowType, starttime, deletedat)
and(rowType, endtime, deletedat)
and deleted the existing one, and absurdly improve query performance. But I didn't see any significant improvement in using UNION instead OR after I change the indexes in this case.Vinicius Gabriel– Vinicius Gabriel2021年03月24日 20:17:03 +00:00Commented Mar 24, 2021 at 20:17 -
Then you nay be seeing a
BitmapOr
, which is how PostgreSQL can handleOR
.Laurenz Albe– Laurenz Albe2021年03月24日 21:00:25 +00:00Commented Mar 24, 2021 at 21:00
Index:
(rowtype, starttime, endtime, clientid, deletedat)
Index usage:
Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
Notice how the index condition skips over starttime (as it not usable, being specified in the OR condition). So while it is comparing clientid and deletedat from the index against specified values, it just using them as an "in-index filter", it can't jump to a specific part of the index where it knows the qualifying values will be. This is quite inefficient, at compared to being to jump to a specific part of the index.
So one thing you could try is simply re-writing the column order in the index.
(rowtype, clientid, deletedat, starttime, endtime)
Using starttime in the OR makes it not useful for use in the index. And having it early in the index makes the rest of the columns, occurring after it in the index, also not useful (well, not nearly as useful as they could be). So that is a double whammy.
-
Thanks jjanes for contribuition. Your aswner it was important for my understanding solution. But I have the correct answer for Laurenz for the article explain ugly OR use cases and hint for separate on two indexes.Vinicius Gabriel– Vinicius Gabriel2021年03月24日 20:18:51 +00:00Commented Mar 24, 2021 at 20:18
Explore related questions
See similar questions with these tags.