i have a table addressfabric_v2_2021_12
CREATE TABLE IF NOT EXISTS public.addressfabric_v2_2021_12(
country character varying(10) COLLATE pg_catalog."default",
areaname1 character varying(10) COLLATE pg_catalog."default",
areaname2 character varying(100) COLLATE pg_catalog."default",
areaname3 character varying(100) COLLATE pg_catalog."default",
areaname4 character varying(100) COLLATE pg_catalog."default",
postcode character varying(20) COLLATE pg_catalog."default",
pbkey character varying(50) COLLATE pg_catalog."default",
prop_type character varying(10) COLLATE pg_catalog."default",
postcode_ext character varying(10) COLLATE pg_catalog."default",
add_number character varying(20) COLLATE pg_catalog."default",
streetname character varying(60) COLLATE pg_catalog."default",
unit_type character varying(20) COLLATE pg_catalog."default",
unit_num character varying(50) COLLATE pg_catalog."default",
type character varying(3) COLLATE pg_catalog."default",
lat character varying(20) COLLATE pg_catalog."default",
lon character varying(20) COLLATE pg_catalog."default",
parent character varying(50) COLLATE pg_catalog."default",
geoid character varying(50) COLLATE pg_catalog."default",
geometry geometry,
CONSTRAINT enforce_addfb_srid_geometry_2021_12 CHECK (st_srid(geometry) = 4326))
I am having performance issue in the below query
select count(*) from addressfabric_v2_2021_12 as addr, (
SELECT ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20)) as
f(geom) where st_intersects(ST_SetSRID(f.geom,4326),addr.geometry);
After this i created the following index
CREATE INDEX IF NOT EXISTS addressfabric_v2_2021_12_geometry
ON public.addressfabric_v2_2021_12 USING gist
(geometry) TABLESPACE pg_default;
The performance was increased but still not satisfactory
Aggregate (cost=591866320.40..591866320.41 rows=1 width=8) (actual
time=11.812..11.813 rows=1 loops=1)->Nested Loop (cost=0.55..591346132.52 rows=208075152 width=0) (actual
time=0.219..11.476 rows=7072 loops=1)
->ProjectSet (cost=0.00..30.02 rows=1000 width=32) (actual
time=0.010..0.012 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.001..0.001 rows=1 loops=1)
-> Index Scan using addressfabric_v2_2021_12_geomtry on
addressfabric_v2_2021_12 addr (cost=0.55..591138.01 rows=20808
width=32) (actual time=0.207..10.893 rows=7072 loops=1)
Index Cond: (geometry && st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326))
Filter: st_intersects(st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326), geometry)
Rows Removed by Filter: 1987
Planning Time: 0.685 ms
Execution Time: 11.900 ms
Any help will be appreciated.
EDIT
on rewriting the query using join , queryplan is including a function scan also
select count(*) from addressfabric_v2_2022_1 as addr join ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20)as f(geom) on st_intersects(ST_SetSRID(f.geom,4326),addr.geometry);
QUERY PLAN
Aggregate (cost=543972801.24..543972801.25 rows=1 width=8) (actual time=11.208..11.209 rows=1 loops=1)->Nested
Loop(cost=25.55..543493555.50 rows=191698296 width=0)
(actual time=0.114..10.856 rows=7073 loops=1)
->Function Scan on st_subdivide f (cost=25.00..35.00 rows=1000
width=32) (actual time=0.010..0.011 rows=1 loops=1)
-> Index Scan using addressfabric_v2_2022_1_geometry on
addressfabric_v2_2022_1 addr(cost=0.55..543301.82 rows=19170
width=32) (actual time=0.101..10.220 rows=7073 loops=1)
Index Cond: (geometry && st_setsrid(f.geom, 4326))
Filter: st_intersects(st_setsrid(f.geom, 4326), geometry)
Rows Removed by Filter: 1987
Planning Time: 0.092 ms
Execution Time: 11.240 ms
Is there any way to create index instead of function scan in new query plan?
update query plan after parallel setting and large dataset
Aggregate (cost=10539438036.82..10539438036.83 rows=1 width=8) (actual time=740.148..740.149 rows=1 loops=1)
-> Nested Loop (cost=10000000025.55..10538917273.50 rows=208305328 width=0) (actual time=0.173..717.950 rows=429963 loops=1)
-> Function Scan on st_subdivide f (cost=25.00..35.00 rows=1000 width=32) (actual time=0.008..0.010 rows=1 loops=1)
-> Index Scan using addressfabric_v2_2022_1_geometry on addressfabric_v2_2022_1 addr (cost=0.55..538708.93 rows=20831 width=32) (actual time=0.162..679.372 rows=429963 loops=1)
Index Cond: (geometry && st_setsrid(f.geom, 4326))
Filter: st_intersects(st_setsrid(f.geom, 4326), geometry)
Rows Removed by Filter: 32726
Planning Time: 0.108 ms
Execution Time: 740.184 ms
2 Answers 2
The issue was resolved by changing the following parameters
work_mem
changed from 4mb to 50 mbrandom_Page_cost
changed to 1 from 4max_wal_size
reduced to 128 mb from 2 gbcpu_index_tuple_cost
to 0.003
Also changed the following as suggested by @Gerad H.pille
max_parallel__Workers_per_Gather
max_parallel_maintainence_workers
-
BTW, have a look if " select count(*) from addressfabric_v2_2021_12 as addr, ( SELECT ST_SetSRID(ST_SubDivide('POLYGON((-74.4234467 40.7541435,-74.3904877 40.7489001,-74.380188 40.7587776,-74.3891144 40.7748904,-74.4179535 40.7790479,-74.4316864 40.7665746,-74.4241333 40.7566983,-74.4234467 40.7541435))'::geometry,20),4326)) as f(geom) where st_intersects(f.geom,addr.geometry);" isn't somewhat faster.Gerard H. Pille– Gerard H. Pille2022年02月24日 21:22:31 +00:00Commented Feb 24, 2022 at 21:22
The only way that may get you below half a second, seems to be to throw more CPU at it. Either you get a faster system, or you use more of what you've got, ie. parallelism. With PostgreSQL 13 you should be safe.
Before your query, execute:
SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 4;
See: parallel postgis
-
I did not see any significant improvement after this, is there any way by making any change in the index or query?sunny– sunny2022年02月03日 06:49:22 +00:00Commented Feb 3, 2022 at 6:49
-
Can we change the filter condition Filter: st_intersects(st_setsrid((st_subdivide('01030000000100000008000000580630C0199B52C0D87E32C687604440603F1FC0FD9852C04EBC5EF5DB5F44403A950C00559852C0FC77D89F1F614440C5991540E79852C0FFFDCE9B2F634440EB6F09C0BF9A52C03D4272D7B76344403188FEBFA09B52C0DE78D21D1F6244404029FFFF249B52C02AB8697DDB604440580630C0199B52C0D87E32C687604440'::geometry, 20, '-1'::double precision)), 4326), geometry)sunny– sunny2022年02月03日 07:22:02 +00:00Commented Feb 3, 2022 at 7:22
-
What is the query plan with parallel workers? Can you provide info on HW and OS?Gerard H. Pille– Gerard H. Pille2022年02月03日 07:38:44 +00:00Commented Feb 3, 2022 at 7:38
-
I am connecting aws rds from local system, config of local system is windows 32 gb ram,I rewrite the query using join and it is better now, i am wondering is there any way to create a index scan instead of funciton scan , i have updated the question with the new findingssunny– sunny2022年02月07日 08:54:43 +00:00Commented Feb 7, 2022 at 8:54
-
If I had known the cloud was involved I wouldn't have bothered. If the polygon is variable there is no way to create an index scan.Gerard H. Pille– Gerard H. Pille2022年02月08日 13:40:00 +00:00Commented Feb 8, 2022 at 13:40
Explore related questions
See similar questions with these tags.
explain (analyze, buffers format text)