I have the following table:
CREATE TABLE transactions
(
id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('transactions_sequence') PRIMARY KEY,
transaction_date TIMESTAMP DEFAULT NULL NULL,
transaction_type VARCHAR(255) DEFAULT NULL NULL,
merchant_id VARCHAR(255) DEFAULT NULL NULL,
transaction_type VARCHAR(255) DEFAULT NULL NULL,
-- Some more columns here
);
and the following index:
CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_type, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;
I have the following queries:
SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
AND transaction_type = 'a'
AND transaction_date >= '2025-01-01'
AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100
This query works just fine and I get an index only scan:
Limit (cost=0.29..7.47 rows=1 width=13) (actual time=1.119..1.120 rows=0 loops=1)
-> Index Scan using transactions_transaction_type_idx on transactions (cost=0.29..7.47 rows=1 width=13) (actual time=1.118..1.118 rows=0 loops=1)
Index Cond: (((transaction_type)::text = 'a'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
Filter: ((merchant_id)::text = 'some_merchant_id'::text)
Planning Time: 0.311 ms
Execution Time: 1.139 ms
However, when I need a transaction_type
independent results with:
SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
AND transaction_date >= '2025-01-01'
AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100
I still get the index only scan:
Limit (cost=38.08..38.19 rows=44 width=13) (actual time=0.108..0.115 rows=47 loops=1)
-> Sort (cost=38.08..38.19 rows=44 width=13) (actual time=0.107..0.110 rows=47 loops=1)
Sort Key: transaction_date DESC
Sort Method: quicksort Memory: 27kB
-> Index Only Scan using transactions_merchant_id_idx on transactions (cost=0.29..36.88 rows=44 width=13) (actual time=0.029..0.093 rows=47 loops=1)
Index Cond: ((merchant_id = 'some_merchant_id'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning Time: 0.228 ms
Execution Time: 0.161 ms
I do have a list of all the potential transaction_type values so I initially thought that this would be better:
SELECT id, transaction_date
FROM transactions
WHERE merchant_id = 'some_merchant_id'
AND transaction_type IN ('a', 'b', 'c', ...) -- all the potential values here
AND transaction_date >= '2025-01-01'
AND transaction_date < '2025-03-28'
ORDER BY transaction_date DESC
LIMIT 100
but instead, depending on the number of values in IN
clause, I might get an additional filter
in the query plan:
Limit (cost=38.29..38.40 rows=43 width=13) (actual time=0.110..0.118 rows=47 loops=1)
-> Sort (cost=38.29..38.40 rows=43 width=13) (actual time=0.109..0.112 rows=47 loops=1)
Sort Key: transaction_date DESC
Sort Method: quicksort Memory: 27kB
-> Index Only Scan using transactions_merchant_id_idx on transactions (cost=0.31..37.13 rows=43 width=13) (actual time=0.030..0.097 rows=47 loops=1)
Index Cond: ((merchant_id = 'some_merchant_id'::text) AND (transaction_date >= '2025-01-01 00:00:00'::timestamp without time zone) AND (transaction_date < '2025-03-28 00:00:00'::timestamp without time zone))
" Filter: ((transaction_type)::text = ANY ('{a,b,c,d,e,f}'::text[]))"
Heap Fetches: 0
Planning Time: 0.340 ms
Execution Time: 0.142 ms
So even if I skip the middle transaction_type
column, I get my index used. But with which query am I better of, with IN
on transaction_type
with all the potential values or without even the filter? How does my index still used without the filter on transaction_type
?
Update:
So, an additional concern; an index:
CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_type, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;
or an index:
CREATE INDEX transactions_merchant_id_idx ON transactions (merchant_id, transaction_date DESC, id) WHERE merchant_id IS NOT NULL;
given that 50% of the time, I won't have restriction on transaction_type
(i.e. no AND transation_type IN ('a', 'b', 'c')
clause.
The remaining 50%, I am trying to eliminate half of my transaction_type
s, which sometimes yields to an extra Filter: ((transaction_type)::text = ANY ('{a,b,c}'::text[]))"
condition regardless the index containing transaction_type
.
So even with the index containing transaction_type
I might get the extra Filter
when the AND transaction_type IN (...)
is not very restrictive.
So which index is better? The one containing transaction_type
or not?
1 Answer 1
If you narrow down the transaction_type
with the IN
condition, you will have to sort fewer rows and will be faster.
The index is still used even if transaction_type
is not restricted, because PostgreSQL deems an index-only scan on the index cheaper than a sequential table scan.
It is difficult to say which index would be best overall, because you have to consider all queries from your workload, the rate at which queries and data modifications occur and outside constraints like user expectations. Ignorant of all that, I would suggest an index like
CREATE INDEX ON transactions (merchant_id, transaction_date);
because I assume that transaction_type
is not very selective. If it is selective, I'd probably add that column at the end. I wouldn't include any other columns unless an index-only scan is badly needed for performance reasons. In that case, I would add these column in an INCLUDE
clause.
-
If I cannot narrow down
transaction_type
, am I better off skipping the filter altogether (i.e. notransaction_type
clause)? Or would it be the same?Hasan Can Saral– Hasan Can Saral2025年03月27日 13:10:56 +00:00Commented Mar 27 at 13:10 -
1If you add a
WHERE
condition that does not restrict the result set, that is, name all available transaction types in theIN
list, the query will become slower because of the additional redundant checks. Not much slower, but measurable slower.Laurenz Albe– Laurenz Albe2025年03月27日 14:09:15 +00:00Commented Mar 27 at 14:09 -
Would I be better off with an index
(merchant_id, transaction_type, transaction_date, id)
or(merchant_id, transaction_date, id)
, given that my when I have the index withtransaction_type
I mostly get a an extra filter step in the execution plan? I have ~10 transaction_types and 50% of the time, I need all of them (i.e. noAND transaction_type IN ('a', 'b')
. Remaining 50%, I need half of ~10 types, which, sometimes yields to a Filter step, sometimes not.Hasan Can Saral– Hasan Can Saral2025年04月02日 09:32:10 +00:00Commented Apr 2 at 9:32 -
I've updated my question with some explanation on my extra concern, any further explanation would be greatly appreciated.Hasan Can Saral– Hasan Can Saral2025年04月02日 09:43:27 +00:00Commented Apr 2 at 9:43