Poor query performance on AWS RDS Aurora PostgreSQL (bitmap heap scan)

Question 1

I'm trying to optimize a rather slow query running on AWS RDS Aurora PostgreSQL v16.1. The following query takes approx. 25 seconds:

SELECT t."Key", t."ActiveFrom", t."Content", t."Created", t."Postcode", t."HouseKeys", t."InactiveFrom", t."LastUpdated"
FROM "Tours" AS t
WHERE t."HouseKeys" && '{12345678,09876543,<<900 more unique keys>>}'
AND t."Postcode" = ANY ('{863482,248624,047628}')
AND t."ActiveFrom" <= '2024-09-30 09:20:23.503431+00' AND (t."InactiveFrom" IS NULL OR t."InactiveFrom" > '2024-09-30 09:20:23.503431+00')

This query is generated by Microsoft's ORM Entity Framework Core v8 that is used in the corresponding business application code. I captured this slow query by enabling the PostgreSQL feature auto_explain via the following DB config parameters:

shared_preload_libraries = auto_explain,pg_stat_statements
auto_explain.log_analyze = ON
auto_explain.log_min_duration = 15000 (in ms, i.e. 15 s)

This gave me the following log message:

2024年09月30日 09:20:48 UTC:10.118.38.109(55654):MyUser@MyDatabase:[3823]:LOG: duration: 25309.354 ms plan:
Query Text: SELECT t."Key", t."ActiveFrom", t."Content", t."Created", t."Postcode", t."HouseKeys", t."InactiveFrom", t."LastUpdated"
FROM "Tours" AS t
WHERE t."HouseKeys" && 1ドル AND t."Postcode" = ANY (2ドル) AND t."ActiveFrom" <= 3ドル AND (t."InactiveFrom" IS NULL OR t."InactiveFrom" > 3ドル)
Query Parameters: 1ドル = '{12345678,09876543,<<900 more unique keys>>}', 2ドル = '{863482,248624,047628}', 3ドル = '2024-09-30 09:20:23.503431+00'
Bitmap Heap Scan on "Tours" t (cost=21.19..1111.50 rows=807 width=848) (actual time=1365.628..24999.948 rows=8 loops=1)
Recheck Cond: (("Postcode")::text = ANY ('{863482,248624,047628}'::text[]))
Filter: (("HouseKeys" && '{12345678,09876543,<<900 more unique keys>>}'::text[]) AND ("ActiveFrom" <= '2024-09-30 09:20:23.503431+00'::timestamp with time zone) AND (("InactiveFrom" IS NULL) OR ("InactiveFrom" > '2024-09-30 09:20:23.503431+00'::timestamp with time zone)))
Rows Removed by Filter: 1077
Heap Blocks: exact=373
-> Bitmap Index Scan on "IX_Tours_Postcode" (cost=0.00..20.99 rows=1085 width=0) (actual time=0.068..0.069 rows=1085 loops=1)
Index Cond: (("Postcode")::text = ANY ('{863482,248624,047628}'::text[]))

After finding this answer, I ensured that JIT is disabled:

# SHOW jit
off

This is the DDL of the table which contains only 5000 rows:

CREATE TABLE IF NOT EXISTS public."Tours"
(
 "Key" character varying(50) COLLATE pg_catalog."default" NOT NULL,
 "Content" jsonb NOT NULL,
 "LastUpdated" timestamp with time zone NOT NULL,
 "Created" timestamp with time zone NOT NULL,
 "HouseKeys" text[] COLLATE pg_catalog."default" NOT NULL,
 "ActiveFrom" timestamp with time zone NOT NULL DEFAULT '-infinity'::timestamp with time zone,
 "InactiveFrom" timestamp with time zone,
 "Postcode" character varying(6) COLLATE pg_catalog."default" NOT NULL DEFAULT ''::text,
 CONSTRAINT "PK_Tours" PRIMARY KEY ("Key", "ActiveFrom")
)
CREATE INDEX IF NOT EXISTS "IX_Tours_Postcode"
 ON public."Tours" USING btree
 ("Postcode" COLLATE pg_catalog."default" ASC NULLS LAST)
 TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS "IX_Tours_HouseKeys"
 ON public."Tours" USING gin
 ("HouseKeys" COLLATE pg_catalog."default")
 TABLESPACE pg_default;

Can anyone please help me with tuning this query? I wonder whether adding a combined index for all queried fields (HouseKeys,Postcode,ActiveFrom,InactiveFrom) would be a good idea. If so, which index type would be a good choice?

Update 01.10.2024 - I: after setting the DB config parameter auto_explain.log_buffers to ON, the log output looks like this:

2024年10月01日 07:04:18 UTC:10.118.38.132(60402):MyUser@MyDatabase:[983]:LOG: duration: 47855.615 ms plan:
Query Text: SELECT t."Key", t."ActiveFrom", t."Postcode", t."HouseKeys", t."InactiveFrom", t."LastUpdated"
FROM "Tours" AS t
WHERE t."HouseKeys" && 1ドル AND t."Postcode" = ANY (2ドル) AND t."ActiveFrom" <= 3ドル AND (t."InactiveFrom" IS NULL OR t."InactiveFrom" > 3ドル)
Query Parameters: 1ドル = '{12345678,09876543,<<900 more unique keys>>}', 2ドル = '{863482,248624,047628}', 3ドル = '2024-10-01 07:03:30.609959+00'
Bitmap Heap Scan on "Tours" t (cost=21.19..1111.50 rows=807 width=848) (actual time=47855.610..47855.612 rows=0 loops=1)
Recheck Cond: (("Postcode")::text = ANY ('{863482,248624,047628}'::text[]))
Filter: (("HouseKeys" && '{12345678,09876543,<<900 more unique keys>>}'::text[]) AND ("ActiveFrom" <= '2024-10-01 07:03:30.609959+00'::timestamp with time zone) AND (("InactiveFrom" IS NULL) OR ("InactiveFrom" > '2024-10-01 07:03:30.609959+00'::timestamp with time zone)))
Rows Removed by Filter: 1085
Heap Blocks: exact=373
Buffers: shared hit=1206
-> Bitmap Index Scan on "IX_Tours_Postcode" (cost=0.00..20.99 rows=1085 width=0) (actual time=0.067..0.067 rows=1085 loops=1)
Index Cond: (("Postcode")::text = ANY ('{863482,248624,047628}'::text[]))
Buffers: shared hit=6

Update 01.10.2024 - II:

Furthermore, I observed that running the query "in isolation" (i.e. there is no load on the database by other queries), the execution time goes down to 2 s. However, in situations when slow queries occur, 20 of these queries run in parallel.
So I made the following small test on my local machine:

Run PostgreSQL in Docker
Backup DB and restore it locally
Run docker stats for the PostgreSQL container
Run the query

I noticed that the CPU usage went through the roof when using the && operator to check whether the arrays overlap. I also see a very high CPU usage in the AWS RDS Aurora PostgreSQL metrics during the execution time.

So I wonder whether I'm actually overloading the database by using the && operator and if there are any alternatives 🤔

Question 2

It would help to set auto_explain.log_buffers = on. Your proposed index is worth a try, the only type which makes sense for it is btree.

Question 3

I will enable auto_explain.log_buffers, do another test, and get back to you

Question 4

@jjanes: please see the Update 01.10.2024 section in the question

Question 5

Hmm. I was thinking it was IO, but obviously not. How long is the typical HouseKeys array? What version of PostgreSQL is this when you reproduce the issue locally, still 16.1? (Also, you shouldn't be using 16.1 anymore--is that the newest bugfix Aurora offers?)

Question 6

When done in a Filter condition, the performance of a && is proportional to the product of its 2 input array lengths for a failing case; for a success it stops as soon as the first match is found. (When one of the inputs is constant, like it is here, it could in theory build it into a hash table, but that optimization is not implemented.)

If both input arrays are large, this is going to be slow. But the planner is not very smart about this, it just assumes the cost of && is always the same, ignoring what it knows about the input sizes (if it even knows anything about them to start with).

If it were to instead use the GIN index on "HouseKeys", it would not need to actually execute the && operator, avoiding that high cost. Instead of being O(N*M), it would be more like O(NlogM). This does depend on work_mem being high enough to avoid rechecks, but with only a few thousand rows this should not be a problem.

The problem is that since the planner doesn't know && is slow, it doesn't know that it should use this index to avoid that slowness. Instead it uses the other index, probably because it thinks it will be more selective. You could drop the index, or if it is needed you could rewrite the query to make it think the wrong index is unusable by doing a no-op operation to the indexed column:

AND t."Postcode"||'' = ANY ('{863482,248624,047628}')

In my hands, this just switched to a seqscan rather than the other index, so I had to resort to more tricks to get it to use the right index. I don't know if that will be necessary for you.

If you are willing to use the pg_hint_plan extension (which is available on Aurora) then you could use the hint /*+ BitMapScan (t IX_Tours_HouseKeys) */ to force the issue. But I don't know if there is way to get the ORM to inject this hint.

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Answer 1 · 2024-10-01 21:54:05Z

When done in a Filter condition, the performance of a && is proportional to the product of its 2 input array lengths for a failing case; for a success it stops as soon as the first match is found. (When one of the inputs is constant, like it is here, it could in theory build it into a hash table, but that optimization is not implemented.)

If both input arrays are large, this is going to be slow. But the planner is not very smart about this, it just assumes the cost of && is always the same, ignoring what it knows about the input sizes (if it even knows anything about them to start with).

If it were to instead use the GIN index on "HouseKeys", it would not need to actually execute the && operator, avoiding that high cost. Instead of being O(N*M), it would be more like O(NlogM). This does depend on work_mem being high enough to avoid rechecks, but with only a few thousand rows this should not be a problem.

The problem is that since the planner doesn't know && is slow, it doesn't know that it should use this index to avoid that slowness. Instead it uses the other index, probably because it thinks it will be more selective. You could drop the index, or if it is needed you could rewrite the query to make it think the wrong index is unusable by doing a no-op operation to the indexed column:

AND t."Postcode"||'' = ANY ('{863482,248624,047628}')

In my hands, this just switched to a seqscan rather than the other index, so I had to resort to more tricks to get it to use the right index. I don't know if that will be necessary for you.

If you are willing to use the pg_hint_plan extension (which is available on Aurora) then you could use the hint /*+ BitMapScan (t IX_Tours_HouseKeys) */ to force the issue. But I don't know if there is way to get the ORM to inject this hint.

Stack Exchange Network

Poor query performance on AWS RDS Aurora PostgreSQL (bitmap heap scan)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Poor query performance on AWS RDS Aurora PostgreSQL (bitmap heap scan)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions