Index isn't used when value is a result of a subquery or from a wrapper query

Question 1

I have a simple one-to-many relationship: accounts and events where an account may have many events.

accounts

id: uuid
...more irrelevant fields

events

id: serial
type: enum ['activated', 'deactivated', 'removed', ...more]
account_id: uuid
created_at: creation date of the row

events table index:

create index "IDX_d82e2f903c778153e7781f455e" on bank_account_events (type, bank_account_id);

I have a simple query which determines when the last event of a type occurred for a specific account:

explain select created_at
from bank_account_events
where bank_account_id = 'b12edcab-9ac5-4a09-a84c-475c8a73c964'
 and type in ('activated', 'deactivated', 'removed')
order by id desc
limit 1;

The query's result shows that we use the correct index is being used:

Limit (cost=339.51..339.51 rows=1 width=16)
 -> Sort (cost=339.51..339.74 rows=94 width=16)
 Sort Key: id DESC
" -> Index Scan using ""IDX_d82e2f903c778153e7781f455e"" on bank_account_events (cost=0.56..339.04 rows=94 width=16)"
" Index Cond: ((type = ANY ('{activated,deactivated,removed}'::bank_account_events_type_enum[])) AND (bank_account_id = 'b12edcab-9ac5-4a09-a84c-475c8a73c964'::uuid))

However, when I "wrap" this query and use it as a subquery the index isn't being used:

explain select ba.id,
 (
 select created_at
 from bank_account_events
 where bank_account_id = ba.id
 and type in ('activated', 'deactivated', 'removed')
 order by id desc
 limit 1
 ) status_date
from bank_accounts ba
where ba.id = 'b12edcab-9ac5-4a09-a84c-475c8a73c964';

The result of the query above is:

Index Only Scan using ""PK_5a7a02c20412299d198e097a8fe"" on bank_accounts ba (cost=0.27..8.43 rows=1 width=24)"
 Index Cond: (id = 'b12edcab-9ac5-4a09-a84c-475c8a73c964'::uuid)
 SubPlan 1
 -> Limit (cost=0.43..4.14 rows=1 width=16)
" -> Index Scan Backward using ""PK_dddc8f2295ddc2561044644a05a"" on bank_account_events (cost=0.43..264940.10 rows=71513 width=16)"
" Filter: ((bank_account_id = ba.id) AND (type = ANY ('{activated,deactivated,removed}'::bank_account_events_type_enum[])))

As you can see, it's using now the PK of the table (PK_dddc8f2295ddc2561044644a05a) which is just the serial id. Then it filters the results. Since the table has a few million rows, this is painfully slow.

Why isn't Postgres using the index in this scenario? The only change was that instead of referencing the id directly, we refer to it as ba.id.

Question 2

@Vérace Hey! I haven't run it yet, but you query for max(id) across the entire table. Did you mean to also scope it to a bank account?

Question 3

@Vérace it is but if I understand correctly you're trying to remove the "order by" and replace it with another subquery. selecing just any max(id) could give us an ID that does not exist for the current bank_account_id, thus yielding 0 results

Question 4

In the first case it can look up in the stats to see how many records it thinks will have bank_account_id 'b12edcab-9ac5-4a09-a84c-475c8a73c964' and thinks that after combining it with the type in ... there will be 94, while in the second case it cannot do that since that specific value is not known at the time the query is planned. Instead it thinks there will be 71513 with the given type in ... and with some unknown-at-the-time value of bank_account_id.

So apparently your bank accounts have vastly inhomogeneous numbers of events; and the planner doesn't know which type of account you are querying when using it as a subquery.

Question 5

That's bad. I would expect the database to re-evaluate its plan at runtime. Is there anything I can do besides splitting these into 2 queries?

Question 6

Your first formulation is faster without being split into 2 queries, so use that :) If you have some other query in mind that can't be reformulated in the same way, you should edit your question to include it.

Question 7

hey, the query in the question is just a very simplified one. The production query also joins some tables and includes a few more sub queries

Question 8

Change the order of the columns in the index to support the B-tree search for the bank_account_id and filtering by type. Once filtered, all records are read ordered by id. By including the created_at column last, you have all the data the query needs in the index and it won't access the table at all.

CREATE INDEX "nc_bank_account_events_bank_account_id"
ON bank_account_events (bank_account_id, type, id, created_at);

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Answer 1 · 2021-05-13 14:59:07Z

In the first case it can look up in the stats to see how many records it thinks will have bank_account_id 'b12edcab-9ac5-4a09-a84c-475c8a73c964' and thinks that after combining it with the type in ... there will be 94, while in the second case it cannot do that since that specific value is not known at the time the query is planned. Instead it thinks there will be 71513 with the given type in ... and with some unknown-at-the-time value of bank_account_id.

So apparently your bank accounts have vastly inhomogeneous numbers of events; and the planner doesn't know which type of account you are querying when using it as a subquery.

That's bad. I would expect the database to re-evaluate its plan at runtime. Is there anything I can do besides splitting these into 2 queries?
Your first formulation is faster without being split into 2 queries, so use that :) If you have some other query in mind that can't be reformulated in the same way, you should edit your question to include it.
hey, the query in the question is just a very simplified one. The production query also joins some tables and includes a few more sub queries

Fareed Stevenson Fareed Stevenson 212 bronze badges · Answer 2 · 2021-05-16 23:22:02Z

Change the order of the columns in the index to support the B-tree search for the bank_account_id and filtering by type. Once filtered, all records are read ordered by id. By including the created_at column last, you have all the data the query needs in the index and it won't access the table at all.

CREATE INDEX "nc_bank_account_events_bank_account_id"
ON bank_account_events (bank_account_id, type, id, created_at);

Stack Exchange Network

Index isn't used when value is a result of a subquery or from a wrapper query

accounts

events

events table index:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Index isn't used when value is a result of a subquery or from a wrapper query

accounts

events

events table index:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions