PostgreSQL indices on multi fields queries

Question 1

I have the following table:

id message_read notification_sent send_date text version recipient sender 
-- ------------ ----------------- ------------------- --------------------------------------------------- ------- --------- ------ 
1 true false 2015年02月27日 00:00:00 Bonjour. Je suis disponible pour la garde partagée. 0 2 3 
3 false false 2015年03月01日 00:00:00 Je suis disponible pour vous rencontrer dès demain. 0 2 3 
12 false false 2015年02月27日 00:00:00 Hello dude 0 2 1

The query below is the type of queries that typically get run on this table:

select * from public.message
where (recipient = 2 and sender = 3) or (sender = 2 and recipient = 3);

I am trying to create efficient indices on this table and bearing in mind the kind of queries run on it.

Here are my indices:

CREATE INDEX recipient_idx ON message(recipient);
CREATE INDEX sender_idx ON message(sender);

and here is the output of a explain analyze on the query above:

QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------- 
Bitmap Heap Scan on message (cost=8.57..13.40 rows=1 width=49) (actual time=0.072..0.075 rows=3 loops=1) 
 Recheck Cond: ((sender = 3) OR (recipient = 3)) 
 Filter: (((recipient = 2) AND (sender = 3)) OR ((sender = 2) AND (recipient = 3))) 
 Heap Blocks: exact=1 
 -> BitmapOr (cost=8.57..8.57 rows=2 width=0) (actual time=0.050..0.050 rows=0 loops=1) 
 -> Bitmap Index Scan on sender_idx (cost=0.00..4.28 rows=1 width=0) (actual time=0.037..0.037 rows=2 loops=1) 
 Index Cond: (sender = 3) 
 -> Bitmap Index Scan on recipient_idx (cost=0.00..4.28 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1) 
 Index Cond: (recipient = 3) 
Planning time: 0.468 ms 
Execution time: 0.188 ms

Can anyone please tell me what I am getting wrong? Why isn't my query using a btree index and a bitmap one instead? How can I improve my indices?

edit:

Here is what I've tried:

CREATE INDEX recipient_sender_idx ON message(recipient, sender);

Running the explain analyze now yields:

QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------- 
Bitmap Heap Scan on message (cost=8.57..13.40 rows=1 width=49) (actual time=0.035..0.038 rows=3 loops=1) 
 Recheck Cond: ((sender = 3) OR (recipient = 3)) 
 Filter: (((recipient = 2) AND (sender = 3)) OR ((sender = 2) AND (recipient = 3))) 
 Heap Blocks: exact=1 
 -> BitmapOr (cost=8.57..8.57 rows=2 width=0) (actual time=0.023..0.023 rows=0 loops=1) 
 -> Bitmap Index Scan on sender_idx (cost=0.00..4.28 rows=1 width=0) (actual time=0.015..0.015 rows=2 loops=1) 
 Index Cond: (sender = 3) 
 -> Bitmap Index Scan on recipient_idx (cost=0.00..4.28 rows=1 width=0) (actual time=0.007..0.007 rows=1 loops=1) 
 Index Cond: (recipient = 3) 
Planning time: 0.183 ms 
Execution time: 0.070 ms

edit 2:

explain analyze
select * from public.message
where recipient = 2 and sender = 3 
UNION ALL
select * from public.message
where sender = 2 and recipient = 3

yields:

QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------------------- 
Append (cost=0.29..16.63 rows=2 width=49) (actual time=0.031..0.043 rows=3 loops=1) 
 -> Index Scan using recipient_sender_idx on message (cost=0.29..8.31 rows=1 width=49) (actual time=0.030..0.031 rows=2 loops=1) 
 Index Cond: ((recipient = 2) AND (sender = 3)) 
 -> Index Scan using recipient_sender_idx on message message_1 (cost=0.29..8.31 rows=1 width=49) (actual time=0.009..0.010 rows=1 loops=1) 
 Index Cond: ((recipient = 3) AND (sender = 2)) 
Planning time: 0.389 ms 
Execution time: 0.075 ms

but it is not feasible to modify my SQL query.

Question 2

It looks completely reasonable to me, given the existing indexes - the information needed for finding the requested rows is split into two indexes. Try to create a new one on (recipient, sender) and check if you get a result you expect.

Question 3

Thanks dezso. Just one question though, do I need two indices as follows: one(recipient, sender) and two(sender, recipient) - in different order or just the one you advised?

Question 4

I have tried as advised, I still don't get any mention of the recipient_sender_idx being used. Do I need to remove the two first indices?

Question 5

Have you ANALYZEd the table? How many rows do you have in the table and how many match this condition?

Question 6

I have 1K rows in the message table and 3 rows meet the condition. I am not sure about the ANALYZE. Can you point me in the right direction as to how to use ANALYZE?

Question 7

I think the reason it's picking the single indexes and ORing them is that your data is toy-sized; witness:

 -> BitmapOr (rows=0 loops=1) 
 -> Bitmap Index Scan on sender_idx (rows=2 loops=1) 
 Index Cond: (sender = 3) 
 -> Bitmap Index Scan on recipient_idx (rows=1 loops=1) 
 Index Cond: (recipient = 3)

and

 scanned 101 of 101 pages, containing 9642 live rows

The table is pretty tiny and the indexes are highly selective, so it's probably faster to scan the smaller indexes then filter the results. There's no point using a composite index when the singular indexes find just a couple of rows anyway.

If the data was bigger and/or there were more rows for a given sender/recipient, it should ideally use two independent index scans on a composite index, then merging them. I don't think the planner is smart enough to realise that this can be executed as two queries and UNIONed,though, so the best you're going to get is probably two bitmap index scans on the composite index, then a BitmapOr.

Try:

select * from public.message
where recipient = 2 and sender = 3 
UNION ALL
select * from public.message
sender = 2 and recipient = 3;

to see if the plan picked is closer to what you want. Even if it produces the plan you want, with two IndexScan nodes on the composite index then an Append node for the union, it might actually be slower.

Question 8

Can you advise how many rows is not "tiny"? 10K? 100K?

Question 9

@balteo I think that proves the point. The performance is nearly the same; there's no real benefit, and the current plan is fine. It's not just about table size, it's also about selectivity: how many rows have the same value. If 10% of rows had recipient=1, this plan would be quite bad. If 0.05% of rows have recipient=1, this plan is quite good. PostgreSQL adjusts its plan choices based on statistics about the table for this reason. So it's not simply that 10k rows isn't much for a narrow table on a modern system, it's also that the indexes look highly selective.

Question 10

OK. Thanks a lot Graig. I am now going to carry on my exploration of Postgresql indices and explain plans...

Question 11

@balteo You'll find explain.depesz.com good for more complex plans. Read up on the statistics based planner, statistics targets, index selectivity, etc.

Craig Ringer Craig Ringer 57.9k6 gold badges162 silver badges193 bronze badges · Accepted Answer · 2015-09-14 12:19:16Z

I think the reason it's picking the single indexes and ORing them is that your data is toy-sized; witness:

 -> BitmapOr (rows=0 loops=1) 
 -> Bitmap Index Scan on sender_idx (rows=2 loops=1) 
 Index Cond: (sender = 3) 
 -> Bitmap Index Scan on recipient_idx (rows=1 loops=1) 
 Index Cond: (recipient = 3)

and

 scanned 101 of 101 pages, containing 9642 live rows

The table is pretty tiny and the indexes are highly selective, so it's probably faster to scan the smaller indexes then filter the results. There's no point using a composite index when the singular indexes find just a couple of rows anyway.

If the data was bigger and/or there were more rows for a given sender/recipient, it should ideally use two independent index scans on a composite index, then merging them. I don't think the planner is smart enough to realise that this can be executed as two queries and UNIONed,though, so the best you're going to get is probably two bitmap index scans on the composite index, then a BitmapOr.

Try:

select * from public.message
where recipient = 2 and sender = 3 
UNION ALL
select * from public.message
sender = 2 and recipient = 3;

to see if the plan picked is closer to what you want. Even if it produces the plan you want, with two IndexScan nodes on the composite index then an Append node for the union, it might actually be slower.

@balteo I think that proves the point. The performance is nearly the same; there's no real benefit, and the current plan is fine. It's not just about table size, it's also about selectivity: how many rows have the same value. If 10% of rows had recipient=1, this plan would be quite bad. If 0.05% of rows have recipient=1, this plan is quite good. PostgreSQL adjusts its plan choices based on statistics about the table for this reason. So it's not simply that 10k rows isn't much for a narrow table on a modern system, it's also that the indexes look highly selective.
OK. Thanks a lot Graig. I am now going to carry on my exploration of Postgresql indices and explain plans...
@balteo You'll find explain.depesz.com good for more complex plans. Read up on the statistics based planner, statistics targets, index selectivity, etc.

Stack Exchange Network

PostgreSQL indices on multi fields queries

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL indices on multi fields queries

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions