How does PostgreSQL Use my composite index without the middle column in the query?

Question 1

I have the following table:

CREATE TABLE transactions
(
 id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('transactions_sequence') PRIMARY KEY,
 transaction_date TIMESTAMP DEFAULT NULL NULL,
 transaction_type VARCHAR(255) DEFAULT NULL NULL,
 merchant_id VARCHAR(255) DEFAULT NULL NULL,
 transaction_type VARCHAR(255) DEFAULT NULL NULL,
 -- Some more columns here
);

and the following index:

CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_type, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;

I have the following queries:

SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
 AND transaction_type = 'a'
 AND transaction_date >= '2025-01-01'
 AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100

This query works just fine and I get an index only scan:

Limit (cost=0.29..7.47 rows=1 width=13) (actual time=1.119..1.120 rows=0 loops=1)
 -> Index Scan using transactions_transaction_type_idx on transactions (cost=0.29..7.47 rows=1 width=13) (actual time=1.118..1.118 rows=0 loops=1)
 Index Cond: (((transaction_type)::text = 'a'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
 Filter: ((merchant_id)::text = 'some_merchant_id'::text)
Planning Time: 0.311 ms
Execution Time: 1.139 ms

However, when I need a transaction_type independent results with:

SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
 AND transaction_date >= '2025-01-01'
 AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100

I still get the index only scan:

Limit (cost=38.08..38.19 rows=44 width=13) (actual time=0.108..0.115 rows=47 loops=1)
 -> Sort (cost=38.08..38.19 rows=44 width=13) (actual time=0.107..0.110 rows=47 loops=1)
 Sort Key: transaction_date DESC
 Sort Method: quicksort Memory: 27kB
 -> Index Only Scan using transactions_merchant_id_idx on transactions (cost=0.29..36.88 rows=44 width=13) (actual time=0.029..0.093 rows=47 loops=1)
 Index Cond: ((merchant_id = 'some_merchant_id'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
 Heap Fetches: 0
Planning Time: 0.228 ms
Execution Time: 0.161 ms

I do have a list of all the potential transaction_type values so I initially thought that this would be better:

SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
 AND transaction_type IN ('a', 'b', 'c', ...) -- all the potential values here
 AND transaction_date >= '2025-01-01'
 AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100

but instead, depending on the number of values in IN clause, I might get an additional filter in the query plan:

Limit (cost=38.29..38.40 rows=43 width=13) (actual time=0.110..0.118 rows=47 loops=1)
 -> Sort (cost=38.29..38.40 rows=43 width=13) (actual time=0.109..0.112 rows=47 loops=1)
 Sort Key: transaction_date DESC
 Sort Method: quicksort Memory: 27kB
 -> Index Only Scan using transactions_merchant_id_idx on transactions (cost=0.31..37.13 rows=43 width=13) (actual time=0.030..0.097 rows=47 loops=1)
 Index Cond: ((merchant_id = 'some_merchant_id'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
" Filter: ((transaction_type)::text = ANY ('{a,b,c,d,e,f}'::text[]))"
 Heap Fetches: 0
Planning Time: 0.340 ms
Execution Time: 0.142 ms

So even if I skip the middle transaction_type column, I get my index used. But with which query am I better of, with IN on transaction_type with all the potential values or without even the filter? How does my index still used without the filter on transaction_type?

Update:

So, an additional concern; an index:

CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_type, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;

or an index:

CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;

given that 50% of the time, I won't have restriction on transaction_type (i.e. no AND transation_type IN ('a', 'b', 'c') clause.

The remaining 50%, I am trying to eliminate half of my transaction_types, which sometimes yields to an extra Filter: ((transaction_type)::text = ANY ('{a,b,c}'::text[]))" condition regardless the index containing transaction_type.

So even with the index containing transaction_type I might get the extra Filter when the AND transaction_type IN (...) is not very restrictive.

So which index is better? The one containing transaction_type or not?

Question 2

If you narrow down the transaction_type with the IN condition, you will have to sort fewer rows and will be faster.

The index is still used even if transaction_type is not restricted, because PostgreSQL deems an index-only scan on the index cheaper than a sequential table scan.

It is difficult to say which index would be best overall, because you have to consider all queries from your workload, the rate at which queries and data modifications occur and outside constraints like user expectations. Ignorant of all that, I would suggest an index like

CREATE INDEX ON transactions (merchant_id, transaction_date);

because I assume that transaction_type is not very selective. If it is selective, I'd probably add that column at the end. I wouldn't include any other columns unless an index-only scan is badly needed for performance reasons. In that case, I would add these column in an INCLUDE clause.

Question 3

If I cannot narrow down transaction_type, am I better off skipping the filter altogether (i.e. no transaction_type clause)? Or would it be the same?

Question 4

If you add a WHERE condition that does not restrict the result set, that is, name all available transaction types in the IN list, the query will become slower because of the additional redundant checks. Not much slower, but measurable slower.

Question 5

Would I be better off with an index (merchant_id, transaction_type, transaction_date, id) or (merchant_id, transaction_date, id), given that my when I have the index with transaction_type I mostly get a an extra filter step in the execution plan? I have ~10 transaction_types and 50% of the time, I need all of them (i.e. no AND transaction_type IN ('a', 'b'). Remaining 50%, I need half of ~10 types, which, sometimes yields to a Filter step, sometimes not.

Question 6

I've updated my question with some explanation on my extra concern, any further explanation would be greatly appreciated.

Laurenz Albe Laurenz Albe 61.9k4 gold badges57 silver badges93 bronze badges · Accepted Answer · 2025-03-27 13:09:35Z

If you narrow down the transaction_type with the IN condition, you will have to sort fewer rows and will be faster.

The index is still used even if transaction_type is not restricted, because PostgreSQL deems an index-only scan on the index cheaper than a sequential table scan.

It is difficult to say which index would be best overall, because you have to consider all queries from your workload, the rate at which queries and data modifications occur and outside constraints like user expectations. Ignorant of all that, I would suggest an index like

CREATE INDEX ON transactions (merchant_id, transaction_date);

because I assume that transaction_type is not very selective. If it is selective, I'd probably add that column at the end. I wouldn't include any other columns unless an index-only scan is badly needed for performance reasons. In that case, I would add these column in an INCLUDE clause.

If I cannot narrow down transaction_type, am I better off skipping the filter altogether (i.e. no transaction_type clause)? Or would it be the same?
If you add a WHERE condition that does not restrict the result set, that is, name all available transaction types in the IN list, the query will become slower because of the additional redundant checks. Not much slower, but measurable slower.
Would I be better off with an index (merchant_id, transaction_type, transaction_date, id) or (merchant_id, transaction_date, id), given that my when I have the index with transaction_type I mostly get a an extra filter step in the execution plan? I have ~10 transaction_types and 50% of the time, I need all of them (i.e. no AND transaction_type IN ('a', 'b'). Remaining 50%, I need half of ~10 types, which, sometimes yields to a Filter step, sometimes not.
I've updated my question with some explanation on my extra concern, any further explanation would be greatly appreciated.

Stack Exchange Network

How does PostgreSQL Use my composite index without the middle column in the query?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How does PostgreSQL Use my composite index without the middle column in the query?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions