I am wondering if it is possible in one pass to use GeoPandas' dissolve
to merge together only the records matching a query.
Specifically, I have many small areas that should be merged to the larger area with the same ID. It does not matter if they are adjacent or not, or if there is more than one small area are with the same ID. What matters to me is that I can aggregate them taking advantage of the dissolve's aggfunc
parameter to sum up the values in another numeric column of the dataset.
I thought to extract the IDs of these small areas by a simple query like df.loc[df.geometry.area < 0.1]['ID']
and to somehow feed this list of IDs to dissolve, but I guess I cannot do that, or at least I don't know how.
A very ugly image to clarify the point.
-
I can't, because other not small areas might have the same ID as well and I need to preserve those duplicates (they represent different type of coverage but have the same ID). I would just need to get rid of small areas because they are "artifacts" of an intersect operations, and they do not make much sense.umbe1987– umbe19872021年02月26日 09:06:47 +00:00Commented Feb 26, 2021 at 9:06
-
As I explained in the question, I would like to extract the IDs of very small areas (e.g. <0.1m), and I would like to dissolve only these IDs.umbe1987– umbe19872021年02月26日 09:09:04 +00:00Commented Feb 26, 2021 at 9:09
2 Answers 2
I sincerely thank @BERA and @MikeHoney for taking the time to answer. I was looking for a sort of on-line solution, but apparently dissolve can't be used with a filter.
I accepted @MikeHoney's answer because that is basically what I had to do eventually, but I'd like to show you a reproducible code to achieve what I needed, so that hopefully anybody can benefit from it. I tried to solve the problem as concise as possible (3 lines of code).
from shapely.geometry import Polygon
import geopandas as gpd
import random
lat = list(range(6))
lon = lat
fid = list(range(1, 4))*2
gdf = gpd.GeoDataFrame()
gdf['ID'] = fid
gdf['lat'] = lat
gdf['lon'] = lon
geoms = []
random.seed(12)
for index, row in gdf.iterrows():
dim = round(random.uniform(0.1, 1.0), 10) # define the length of the side of the square (random float between 0.1 and 1.0)
ln = row.lon
lt = row.lat
geom = Polygon([(ln, lt), ((ln + dim), lt), ((ln + dim), (lt - dim)), (ln, (lt - dim))])
geoms.append(geom)
gdf['geometry'] = geoms
gdf['area'] = gdf.geometry.area
columns = ['ID', 'area', 'geometry']
gdf = gdf[columns]
gdf.sort_values(by='ID', inplace=True)
# THESE THREE LINES DID THE TRICK!
small_area_ids = gdf.loc[gdf.geometry.area < 0.1]['ID']
dissolved = gdf.loc[gdf['ID'].isin(small_area_ids)].dissolve('ID', aggfunc='sum').reset_index()
gdf = gdf.loc[~gdf['ID'].isin(small_area_ids), :].append(dissolved)
-
Nice work - glad you could sort this. My style is to use step-by-step gdf / df objects, so I can examine them when debugging. But I know that's not to everyone's taste.Mike Honey– Mike Honey2021年03月03日 00:51:05 +00:00Commented Mar 3, 2021 at 0:51
A few steps seem necessary:
- Make a dataframe of just the small areas, e.g. using numpy where => df_small_areas
- pandas group by ID => df_small_IDs
- pandas merge df_small_IDs with the original dataframe on ID, using inner join => df_areas_to_dissolve
- pandas merge df_small_IDs with the original dataframe on ID, using left excluding join => df_areas_to_not_dissolve
- geopandas.dissolve df_areas_to_dissolve
- pandas concat df_areas_to_dissolve with df_areas_to_not_dissolve