improve query performance in postgres

Question 1

i have a table addressfabric_v2_2021_12

CREATE TABLE IF NOT EXISTS public.addressfabric_v2_2021_12(
country character varying(10) COLLATE pg_catalog."default",
areaname1 character varying(10) COLLATE pg_catalog."default",
areaname2 character varying(100) COLLATE pg_catalog."default",
areaname3 character varying(100) COLLATE pg_catalog."default",
areaname4 character varying(100) COLLATE pg_catalog."default",
postcode character varying(20) COLLATE pg_catalog."default",
pbkey character varying(50) COLLATE pg_catalog."default",
prop_type character varying(10) COLLATE pg_catalog."default",
postcode_ext character varying(10) COLLATE pg_catalog."default",
add_number character varying(20) COLLATE pg_catalog."default",
streetname character varying(60) COLLATE pg_catalog."default",
unit_type character varying(20) COLLATE pg_catalog."default",
unit_num character varying(50) COLLATE pg_catalog."default",
type character varying(3) COLLATE pg_catalog."default",
lat character varying(20) COLLATE pg_catalog."default",
lon character varying(20) COLLATE pg_catalog."default",
parent character varying(50) COLLATE pg_catalog."default",
geoid character varying(50) COLLATE pg_catalog."default",
geometry geometry,
CONSTRAINT enforce_addfb_srid_geometry_2021_12 CHECK (st_srid(geometry) = 4326))

I am having performance issue in the below query

 select count(*) from addressfabric_v2_2021_12 as addr, (
SELECT ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20)) as
f(geom) where st_intersects(ST_SetSRID(f.geom,4326),addr.geometry);

After this i created the following index

 CREATE INDEX IF NOT EXISTS addressfabric_v2_2021_12_geometry
ON public.addressfabric_v2_2021_12 USING gist
(geometry) TABLESPACE pg_default;

The performance was increased but still not satisfactory

Aggregate (cost=591866320.40..591866320.41 rows=1 width=8) (actual 
 time=11.812..11.813 rows=1 loops=1)->Nested Loop (cost=0.55..591346132.52 rows=208075152 width=0) (actual 
 time=0.219..11.476 rows=7072 loops=1)
 ->ProjectSet (cost=0.00..30.02 rows=1000 width=32) (actual 
 time=0.010..0.012 rows=1 loops=1)
 -> Result (cost=0.00..0.01 rows=1 width=0) (actual 
 time=0.001..0.001 rows=1 loops=1)
 -> Index Scan using addressfabric_v2_2021_12_geomtry on 
 addressfabric_v2_2021_12 addr (cost=0.55..591138.01 rows=20808 
 width=32) (actual time=0.207..10.893 rows=7072 loops=1)
 Index Cond: (geometry && st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326))
 Filter: st_intersects(st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326), geometry)
 Rows Removed by Filter: 1987
Planning Time: 0.685 ms
Execution Time: 11.900 ms

Any help will be appreciated.

EDIT

on rewriting the query using join , queryplan is including a function scan also

 select count(*) from addressfabric_v2_2022_1 as addr join ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20)as f(geom) on st_intersects(ST_SetSRID(f.geom,4326),addr.geometry);

QUERY PLAN

Aggregate (cost=543972801.24..543972801.25 rows=1 width=8) (actual time=11.208..11.209 rows=1 loops=1)->Nested 
 Loop(cost=25.55..543493555.50 rows=191698296 width=0) 
(actual time=0.114..10.856 rows=7073 loops=1)
->Function Scan on st_subdivide f (cost=25.00..35.00 rows=1000 
 width=32) (actual time=0.010..0.011 rows=1 loops=1)
-> Index Scan using addressfabric_v2_2022_1_geometry on 
addressfabric_v2_2022_1 addr(cost=0.55..543301.82 rows=19170 
width=32) (actual time=0.101..10.220 rows=7073 loops=1)
Index Cond: (geometry && st_setsrid(f.geom, 4326))
Filter: st_intersects(st_setsrid(f.geom, 4326), geometry)
Rows Removed by Filter: 1987
Planning Time: 0.092 ms
Execution Time: 11.240 ms

Is there any way to create index instead of function scan in new query plan?

update query plan after parallel setting and large dataset

Aggregate (cost=10539438036.82..10539438036.83 rows=1 width=8) (actual time=740.148..740.149 rows=1 loops=1)
 -> Nested Loop (cost=10000000025.55..10538917273.50 rows=208305328 width=0) (actual time=0.173..717.950 rows=429963 loops=1)
 -> Function Scan on st_subdivide f (cost=25.00..35.00 rows=1000 width=32) (actual time=0.008..0.010 rows=1 loops=1)
 -> Index Scan using addressfabric_v2_2022_1_geometry on addressfabric_v2_2022_1 addr (cost=0.55..538708.93 rows=20831 width=32) (actual time=0.162..679.372 rows=429963 loops=1)
 Index Cond: (geometry && st_setsrid(f.geom, 4326))
 Filter: st_intersects(st_setsrid(f.geom, 4326), geometry)
 Rows Removed by Filter: 32726
Planning Time: 0.108 ms
Execution Time: 740.184 ms

Question 2

The JSON execution plan isn't really meant to be read by humans. It's better to post the plan generated using explain (analyze, buffers format text)

Question 3

I'm afraid the only reason for the performance improvement after the index creation, was an amount of data that was kept in memory, so didn't need disk access. I can't see how, given a polygon, possible intersecting geometries could be generated to look up using the index.

Question 4

@GerardH.Pille Making that faster is exactly what GiST indexes and postgis extension is for.

Question 5

@jjanes Of course, but if you have nothing to search for, what use is an index?

Question 6

@a_horse_with_no_name updated the query plan

Question 7

The issue was resolved by changing the following parameters

work_mem changed from 4mb to 50 mb
random_Page_cost changed to 1 from 4
max_wal_size reduced to 128 mb from 2 gb
cpu_index_tuple_cost to 0.003

Also changed the following as suggested by @Gerad H.pille

 max_parallel__Workers_per_Gather 
 max_parallel_maintainence_workers

Question 8

BTW, have a look if " select count(*) from addressfabric_v2_2021_12 as addr, ( SELECT ST_SetSRID(ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20),4326)) as f(geom) where st_intersects(f.geom,addr.geometry);" isn't somewhat faster.

Question 9

The only way that may get you below half a second, seems to be to throw more CPU at it. Either you get a faster system, or you use more of what you've got, ie. parallelism. With PostgreSQL 13 you should be safe.

Before your query, execute:

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 4;

See: parallel postgis

Question 10

I did not see any significant improvement after this, is there any way by making any change in the index or query?

Question 11

Can we change the filter condition Filter: st_intersects(st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326), geometry)

Question 12

What is the query plan with parallel workers? Can you provide info on HW and OS?

Question 13

I am connecting aws rds from local system, config of local system is windows 32 gb ram,I rewrite the query using join and it is better now, i am wondering is there any way to create a index scan instead of funciton scan , i have updated the question with the new findings

Question 14

If I had known the cloud was involved I wouldn't have bothered. If the polygon is variable there is no way to create an index scan.

sunny sunny 315 bronze badges · Answer 1 · 2022-02-24 11:58:22Z

1

The issue was resolved by changing the following parameters

work_mem changed from 4mb to 50 mb
random_Page_cost changed to 1 from 4
max_wal_size reduced to 128 mb from 2 gb
cpu_index_tuple_cost to 0.003

Also changed the following as suggested by @Gerad H.pille

 max_parallel__Workers_per_Gather 
 max_parallel_maintainence_workers

Share

Improve this answer

edited Feb 24, 2022 at 12:31

answered Feb 24, 2022 at 11:58

sunny's user avatar

sunny sunny

315 bronze badges

1

BTW, have a look if " select count(*) from addressfabric_v2_2021_12 as addr, ( SELECT ST_SetSRID(ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20),4326)) as f(geom) where st_intersects(f.geom,addr.geometry);" isn't somewhat faster.

Gerard H. Pille
– Gerard H. Pille

2022年02月24日 21:22:31 +00:00
Commented Feb 24, 2022 at 21:22

Add a comment |

score 0 · Answer 2 · 2022-02-02 10:23:24Z

0

The only way that may get you below half a second, seems to be to throw more CPU at it. Either you get a faster system, or you use more of what you've got, ie. parallelism. With PostgreSQL 13 you should be safe.

Before your query, execute:

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 4;

See: parallel postgis

Share

Improve this answer

answered Feb 2, 2022 at 10:23

Gerard H. Pille's user avatar

Gerard H. Pille Gerard H. Pille

3,2851 gold badge10 silver badges13 bronze badges

20

I did not see any significant improvement after this, is there any way by making any change in the index or query?

sunny
– sunny

2022年02月03日 06:49:22 +00:00
Commented Feb 3, 2022 at 6:49
Can we change the filter condition Filter: st_intersects(st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326), geometry)

sunny
– sunny

2022年02月03日 07:22:02 +00:00
Commented Feb 3, 2022 at 7:22
What is the query plan with parallel workers? Can you provide info on HW and OS?

Gerard H. Pille
– Gerard H. Pille

2022年02月03日 07:38:44 +00:00
Commented Feb 3, 2022 at 7:38
I am connecting aws rds from local system, config of local system is windows 32 gb ram,I rewrite the query using join and it is better now, i am wondering is there any way to create a index scan instead of funciton scan , i have updated the question with the new findings

sunny
– sunny

2022年02月07日 08:54:43 +00:00
Commented Feb 7, 2022 at 8:54
If I had known the cloud was involved I wouldn't have bothered. If the polygon is variable there is no way to create an index scan.

Gerard H. Pille
– Gerard H. Pille

2022年02月08日 13:40:00 +00:00
Commented Feb 8, 2022 at 13:40

| Show 15 more comments

Stack Exchange Network

improve query performance in postgres

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

improve query performance in postgres

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions