I have the following query:
SELECT "account_transactions"."subsidiary_id"
FROM "account_transactions"
WHERE (transaction_type = 'loan_payment'
AND amount > 0
AND status = 'approved')
AND (DATE(transacted_at) >= '2020-08-01'
AND DATE(transacted_at) <= '2020-08-31'
AND subsidiary_type = 'Loan')
ORDER BY transacted_at ASC
I identified that it was considered a slow query so I'm trying to optimize it by applying an index. However I tried variations of indices such as putting an index on (transaction_type, amount, status) WHERE amount > 0 AND status='approved' AND transaction_type = 'loan_payment'
or an index on transaction_type
only. But when I run EXPLAIN ANALYZE
, it still shows that it's running Seq Scan meaning that indices aren't being used.
Is this a PostgreSQL thing or am I doing indexing wrong?
-
1One of the reasons could be that your index is not selective enough. What percentage of rows have transaction_type = 'loan_payment', amount > 0, ... ?Gerard H. Pille– Gerard H. Pille2020年10月15日 09:21:15 +00:00Commented Oct 15, 2020 at 9:21
2 Answers 2
Your index is probably not used because it does not match the most selective conditions. If too many rows are returned from the index scan, a sequential scan is more efficient.
Try this index:
CREATE INDEX ON account_transactions (transaction_type, subsidiary_type, status, transacted_at);
If the condition of any of the leading three columns is not selective, omitting the column from the index is better. The important part is that transacted_at
is last, because it is not used in a condition with =
and is used in ORDER BY
.
You can add WHERE
conditions to the index like you did in your question.
It's difficult to optimize your query for unknown data (volume, structure)but since I'm familiar with fixed income industry where this kind queries are used I think the problem is with this part:
AND (DATE(transacted_at) >= '2020-08-01'
AND DATE(transacted_at) <= '2020-08-31'
These values should be used in searchable part of the index or with other words, the index should start with these values. Since you are converting them to DATE datatype you need do 2 things:
- Create a custom function of type "immutable" since built-in function "to_date" is not.
CREATE FUNCTION custom_dt(text) RETURNS date AS $$ select to_date(1,ドル 'YYYY-MM-DD');$$ LANGUAGE sql immutable;
- Build a function base index
CREATE INDEX account_transactions_idx1 ON account_transactions ( custom_dt(transacted_at) );
vacuum analyze account_transactions
Now your query should be changed with these:
AND custom_dt(transacted_at) >= '2020-08-01'
AND custom_dt(transacted_at) <= '2020-08-31'
. If you have many values for transacted_at the index will be used and performance will be better.
But this approach looks more like workaround of the poor table and data design. The values in column "transacted_at" should be stored as DATE datatype in properly designed table. In this case you would never need function and function based index. Also I do not recommend to use column "amount" to be included in index column list unless you have at least 90% zeros, NULLs and negative values on this column.