I have a geodataframe with 8000 rows based on the joining and intersection of two geodataframes. One contains 8000 points and the other 12 polygons. See below the code used to join these two geodataframes; this gives the desired result.
I then use dissolve() to dissolve and sum the rows based on the the previous sjoin. This also gives the required result but the run time is extremely long and this process needs to be carried out on different intersections. Is there a way to speed up this process?
#intersection
df_join = df_points.sjoin(df_polygons[['name', 'geometry']],
how= 'right',
predicate='intersects'
)
#dissolving polygons
df_dissovle = df_join.dissolve(by= 'name',
aggfunc= 'sum',
as_index= False
)
-
1How long does it take? Do you have many multipolygons / multipoints?Bera– Bera2023年04月01日 14:01:22 +00:00Commented Apr 1, 2023 at 14:01
-
It takes 10 minutes to run the code. Yes there a number of multipolygons .Ben Watson– Ben Watson2023年04月04日 07:47:32 +00:00Commented Apr 4, 2023 at 7:47
-
Can you share your dataBera– Bera2023年04月20日 17:51:46 +00:00Commented Apr 20, 2023 at 17:51
1 Answer 1
You could try geofileops. This library tries to improve processing speed of large geodata files by using all CPUs. In the case of dissolve, it uses geopandas under the hood and applies some extra optimizations to speedup processing. 8000 rows cannot be considered large, but possibly it will still be significantly faster if it takes 10 minutes now.
Disclaimer: I'm the developer of geofileops.
Because geofileops only/mainly supports Geopackage file input, you will have to save the result of the join operation to a .gpkg file first, then you can use geofileops.dissolve.
Some untested sample code:
import geofileops as gfo
df_join = df_points.sjoin(
df_polygons[['name', 'geometry']],
how= 'right',
predicate='intersects'
)
gfo.to_file(df_join, "join_output.gpkg")
gfo.dissolve(
input_path="join_output.gpkg",
output_path="dissolve_output.gpkg",
explodecollections=False,
groupby_columns=["name"],
agg_columns={
"columns": [
{"column": "column_to_sum", "agg": "sum", "as": "sum"},
]
},
)