I have a table with emails:
email
- id : numeric, primary key
- id_in_target : text, the ID as stored in Google/MS, indexed
- in_reply_to : nullable, text, references id_in_target in case of a reply, indexed
- ts : timestamp, email's timestamp
... some other columns
Given a list of email IDs, I'm trying to fetch all replies or source emails affected by list of email IDs. So the email table is joined with itself. The query has the following form:
select reply.id, extract(epoch from (source.ts - reply.ts))
from email source
join email reply on source.id_in_target = reply.in_reply_to
where source.id in (ids) or reply.id in (ids)
The problem is with the OR condition on the primary key. If I only select the source or the reply the optimizer uses the primary key. However, with the OR condition, the planner chooses to scan the entire table. I know I can "duplicate" the queries with union, but I just don't understand why it chooses the suboptimal plan when there's clearly a primary key condition.
1 Answer 1
That is because of the OR
. PostgreSQL cannot automatically rewrite the query to a UNION
of two queries, because it cannot prove that the result would be the same: the query with the OR
could return two identical result rows, the UNION
query cannot.
Explore related questions
See similar questions with these tags.