Optimizing PostgreSQL where clause with OR

Question 1

I have a query that uses an or in the where clause. It is slow. It seems that re-writing the query using a union could get the same results much quicker. But I'm in an environment where we generate SQL for different technologies. The or clause works well in Oracle, for example. I'd like to consider other options besides union that might be less radical if there are any good alternatives.

Here's the minimal setup to make the requirement clear.

create table my_table as 
select 
 'EXISTING_RECORDS' as s
 , generate_series(10001,30000) as id
 , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as name
 , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as other_name
;
insert into my_table 
select 
 'NEW_RECORDS' as s
 , generate_series(1001,3000) as id
 , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as name
 , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as other_name
;

That create and insert gives us 20k EXISTING_RECORDS and 2k NEW_RECORDS. The name fields are 5 random characters, so we'll end up with a handful of matches when matching based on the following query.

select t1.s, t1.id, t1.name, t1.other_name, t2.name, t2.other_name, t2.s, t2.id
from
 my_table t1
 , my_table t2
where
 t1.s = 'NEW_RECORDS'
 and ( 
 t1.name = t2.other_name 
 or 
 t2.name = t1.other_name
 )
;

If I reduce the where clause to just t1.name = t2.other_name or the reverse, then it executes in about 7 ms in my environment. But with the or as shown it takes over 11500 ms.

I posted the explain plan in case that's helpful.

Is there any reasonable pattern I should consider in order to get a good execution plan besides converting this to two separate queries that I union together? For example: I'm happy to add indexes. But as far as I can tell, that doesn't provide any benefit.

Question 2

Creating two indexes: create index on my_table (name); create index on my_table (other_name); on my Postgres 11 installation, changed this plan (8 seconds) to this plan (45ms)

Question 3

@a_horse_with_no_name I think you should post this as an answer, as he did invite new indexes. It would be nice if it were faster with no added indexes, but the index is better than nothing

Question 4

Holy cow. With the problem as stated, the indexes are as effective as @a_horse_with_no_name suggests. For me it reduces from 11,500ms to 19ms. I'm sure I tried this before. (Didn't I??) I'll go confirm the effectiveness of this change in my full dev environment.

Question 5

For what it is worth, there was a proposal to execute some OR as if they were UNIONs, postgresql.org/message-id/[email protected], but it never made it into the released code, and I just tested it and it does not solve your particular problem anyway. I'm not sure why it doesn't.

Question 6

Thank you. I saw similar posts. That's what gave me confidence that rewriting the query as unions would likely work. But in my case at least, adding indexes is far easier. It's surprising to me that this example isn't solved by the proposed update.

Question 7

Solution:

create index on my_table (name); 
create index on my_table (other_name);

Credit: @a_horse_with_no_name

Comment: I am befuddled that I didn't find this myself. I'm sure I tested [multiple] variations of adding indexes, and I don't know how I missed the obvious one. But I'm so happy that I'm giggling, and I'm grateful for the additional eyes on the problem to find this simple workaround.

Question 8

No, you have to rewrite the query if you want it to perform well in PostgreSQL.

mdahlman mdahlman 3573 silver badges14 bronze badges · Answer 1 · 2019-05-30 16:25:14Z

Solution:

create index on my_table (name); 
create index on my_table (other_name);

Credit: @a_horse_with_no_name

Comment: I am befuddled that I didn't find this myself. I'm sure I tested [multiple] variations of adding indexes, and I don't know how I missed the obvious one. But I'm so happy that I'm giggling, and I'm grateful for the additional eyes on the problem to find this simple workaround.

Laurenz Albe Laurenz Albe 61.9k4 gold badges57 silver badges93 bronze badges · Answer 2 · 2019-05-29 23:31:55Z

0

No, you have to rewrite the query if you want it to perform well in PostgreSQL.

Share

Improve this answer

answered May 29, 2019 at 23:31

Laurenz Albe's user avatar

Laurenz Albe Laurenz Albe

61.9k4 gold badges57 silver badges93 bronze badges

Add a comment |

Stack Exchange Network

Optimizing PostgreSQL where clause with OR

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimizing PostgreSQL where clause with OR

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions