I have a simple one-to-many relationship: accounts
and events
where an account
may have many events
.
accounts
- id: uuid
- ...more irrelevant fields
events
- id: serial
- type: enum ['activated', 'deactivated', 'removed', ...more]
- account_id: uuid
- created_at: creation date of the row
events table index:
create index "IDX_d82e2f903c778153e7781f455e" on bank_account_events (type, bank_account_id);
I have a simple query which determines when the last event of a type occurred for a specific account:
explain select created_at
from bank_account_events
where bank_account_id = 'b12edcab-9ac5-4a09-a84c-475c8a73c964'
and type in ('activated', 'deactivated', 'removed')
order by id desc
limit 1;
The query's result shows that we use the correct index is being used:
Limit (cost=339.51..339.51 rows=1 width=16)
-> Sort (cost=339.51..339.74 rows=94 width=16)
Sort Key: id DESC
" -> Index Scan using ""IDX_d82e2f903c778153e7781f455e"" on bank_account_events (cost=0.56..339.04 rows=94 width=16)"
" Index Cond: ((type = ANY ('{activated,deactivated,removed}'::bank_account_events_type_enum[])) AND (bank_account_id = 'b12edcab-9ac5-4a09-a84c-475c8a73c964'::uuid))
However, when I "wrap" this query and use it as a subquery the index isn't being used:
explain select ba.id,
(
select created_at
from bank_account_events
where bank_account_id = ba.id
and type in ('activated', 'deactivated', 'removed')
order by id desc
limit 1
) status_date
from bank_accounts ba
where ba.id = 'b12edcab-9ac5-4a09-a84c-475c8a73c964';
The result of the query above is:
Index Only Scan using ""PK_5a7a02c20412299d198e097a8fe"" on bank_accounts ba (cost=0.27..8.43 rows=1 width=24)"
Index Cond: (id = 'b12edcab-9ac5-4a09-a84c-475c8a73c964'::uuid)
SubPlan 1
-> Limit (cost=0.43..4.14 rows=1 width=16)
" -> Index Scan Backward using ""PK_dddc8f2295ddc2561044644a05a"" on bank_account_events (cost=0.43..264940.10 rows=71513 width=16)"
" Filter: ((bank_account_id = ba.id) AND (type = ANY ('{activated,deactivated,removed}'::bank_account_events_type_enum[])))
As you can see, it's using now the PK of the table (PK_dddc8f2295ddc2561044644a05a
) which is just the serial id
. Then it filters the results. Since the table has a few million rows, this is painfully slow.
Why isn't Postgres using the index in this scenario? The only change was that instead of referencing the id directly, we refer to it as ba.id
.
2 Answers 2
In the first case it can look up in the stats to see how many records it thinks will have bank_account_id 'b12edcab-9ac5-4a09-a84c-475c8a73c964' and thinks that after combining it with the type in ...
there will be 94, while in the second case it cannot do that since that specific value is not known at the time the query is planned. Instead it thinks there will be 71513 with the given type in ...
and with some unknown-at-the-time value of bank_account_id.
So apparently your bank accounts have vastly inhomogeneous numbers of events; and the planner doesn't know which type of account you are querying when using it as a subquery.
-
That's bad. I would expect the database to re-evaluate its plan at runtime. Is there anything I can do besides splitting these into 2 queries?kfirba– kfirba2021年05月13日 16:05:21 +00:00Commented May 13, 2021 at 16:05
-
Your first formulation is faster without being split into 2 queries, so use that :) If you have some other query in mind that can't be reformulated in the same way, you should edit your question to include it.jjanes– jjanes2021年05月13日 20:52:24 +00:00Commented May 13, 2021 at 20:52
-
hey, the query in the question is just a very simplified one. The production query also joins some tables and includes a few more sub querieskfirba– kfirba2021年05月14日 05:53:23 +00:00Commented May 14, 2021 at 5:53
Change the order of the columns in the index to support the B-tree search for the bank_account_id and filtering by type. Once filtered, all records are read ordered by id. By including the created_at column last, you have all the data the query needs in the index and it won't access the table at all.
CREATE INDEX "nc_bank_account_events_bank_account_id"
ON bank_account_events (bank_account_id, type, id, created_at);
Explore related questions
See similar questions with these tags.
max(id)
across the entire table. Did you mean to also scope it to a bank account?max(id)
could give us an ID that does not exist for the currentbank_account_id
, thus yielding 0 results