3

I have a query that uses an or in the where clause. It is slow. It seems that re-writing the query using a union could get the same results much quicker. But I'm in an environment where we generate SQL for different technologies. The or clause works well in Oracle, for example. I'd like to consider other options besides union that might be less radical if there are any good alternatives.

Here's the minimal setup to make the requirement clear.

create table my_table as 
select 
 'EXISTING_RECORDS' as s
 , generate_series(10001,30000) as id
 , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as name
 , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as other_name
;
insert into my_table 
select 
 'NEW_RECORDS' as s
 , generate_series(1001,3000) as id
 , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as name
 , chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) || chr( ascii('a') + trunc(random() * 26)::int ) as other_name
;

That create and insert gives us 20k EXISTING_RECORDS and 2k NEW_RECORDS. The name fields are 5 random characters, so we'll end up with a handful of matches when matching based on the following query.

select t1.s, t1.id, t1.name, t1.other_name, t2.name, t2.other_name, t2.s, t2.id
from
 my_table t1
 , my_table t2
where
 t1.s = 'NEW_RECORDS'
 and ( 
 t1.name = t2.other_name 
 or 
 t2.name = t1.other_name
 )
;

If I reduce the where clause to just t1.name = t2.other_name or the reverse, then it executes in about 7 ms in my environment. But with the or as shown it takes over 11500 ms.

I posted the explain plan in case that's helpful.

Is there any reasonable pattern I should consider in order to get a good execution plan besides converting this to two separate queries that I union together? For example: I'm happy to add indexes. But as far as I can tell, that doesn't provide any benefit.

asked May 29, 2019 at 22:54
5
  • 2
    Creating two indexes: create index on my_table (name); create index on my_table (other_name); on my Postgres 11 installation, changed this plan (8 seconds) to this plan (45ms) Commented May 30, 2019 at 5:34
  • @a_horse_with_no_name I think you should post this as an answer, as he did invite new indexes. It would be nice if it were faster with no added indexes, but the index is better than nothing Commented May 30, 2019 at 15:27
  • Holy cow. With the problem as stated, the indexes are as effective as @a_horse_with_no_name suggests. For me it reduces from 11,500ms to 19ms. I'm sure I tried this before. (Didn't I??) I'll go confirm the effectiveness of this change in my full dev environment. Commented May 30, 2019 at 16:08
  • For what it is worth, there was a proposal to execute some OR as if they were UNIONs, postgresql.org/message-id/[email protected], but it never made it into the released code, and I just tested it and it does not solve your particular problem anyway. I'm not sure why it doesn't. Commented May 30, 2019 at 16:32
  • Thank you. I saw similar posts. That's what gave me confidence that rewriting the query as unions would likely work. But in my case at least, adding indexes is far easier. It's surprising to me that this example isn't solved by the proposed update. Commented May 30, 2019 at 16:36

2 Answers 2

1

Solution:

create index on my_table (name); 
create index on my_table (other_name);

Credit: @a_horse_with_no_name

Comment: I am befuddled that I didn't find this myself. I'm sure I tested [multiple] variations of adding indexes, and I don't know how I missed the obvious one. But I'm so happy that I'm giggling, and I'm grateful for the additional eyes on the problem to find this simple workaround.

answered May 30, 2019 at 16:25
0

No, you have to rewrite the query if you want it to perform well in PostgreSQL.

answered May 29, 2019 at 23:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.