GeoPandas dissolve using an attribute filter

Question 1

I am wondering if it is possible in one pass to use GeoPandas' dissolve to merge together only the records matching a query.

Specifically, I have many small areas that should be merged to the larger area with the same ID. It does not matter if they are adjacent or not, or if there is more than one small area are with the same ID. What matters to me is that I can aggregate them taking advantage of the dissolve's aggfunc parameter to sum up the values in another numeric column of the dataset.

I thought to extract the IDs of these small areas by a simple query like df.loc[df.geometry.area < 0.1]['ID'] and to somehow feed this list of IDs to dissolve, but I guess I cannot do that, or at least I don't know how.

A very ugly image to clarify the point.

before merging

after merging

Question 2

I can't, because other not small areas might have the same ID as well and I need to preserve those duplicates (they represent different type of coverage but have the same ID). I would just need to get rid of small areas because they are "artifacts" of an intersect operations, and they do not make much sense.

Question 3

As I explained in the question, I would like to extract the IDs of very small areas (e.g. <0.1m), and I would like to dissolve only these IDs.

Question 4

I sincerely thank @BERA and @MikeHoney for taking the time to answer. I was looking for a sort of on-line solution, but apparently dissolve can't be used with a filter.

I accepted @MikeHoney's answer because that is basically what I had to do eventually, but I'd like to show you a reproducible code to achieve what I needed, so that hopefully anybody can benefit from it. I tried to solve the problem as concise as possible (3 lines of code).

from shapely.geometry import Polygon
import geopandas as gpd
import random
lat = list(range(6))
lon = lat
fid = list(range(1, 4))*2
gdf = gpd.GeoDataFrame()
gdf['ID'] = fid
gdf['lat'] = lat
gdf['lon'] = lon
geoms = []
random.seed(12)
for index, row in gdf.iterrows():
 
 dim = round(random.uniform(0.1, 1.0), 10) # define the length of the side of the square (random float between 0.1 and 1.0)
 ln = row.lon
 lt = row.lat
 geom = Polygon([(ln, lt), ((ln + dim), lt), ((ln + dim), (lt - dim)), (ln, (lt - dim))])
 geoms.append(geom)
gdf['geometry'] = geoms
gdf['area'] = gdf.geometry.area
columns = ['ID', 'area', 'geometry']
gdf = gdf[columns]
gdf.sort_values(by='ID', inplace=True)
# THESE THREE LINES DID THE TRICK!
small_area_ids = gdf.loc[gdf.geometry.area < 0.1]['ID']
dissolved = gdf.loc[gdf['ID'].isin(small_area_ids)].dissolve('ID', aggfunc='sum').reset_index()
gdf = gdf.loc[~gdf['ID'].isin(small_area_ids), :].append(dissolved)

Question 5

Nice work - glad you could sort this. My style is to use step-by-step gdf / df objects, so I can examine them when debugging. But I know that's not to everyone's taste.

Question 6

A few steps seem necessary:

Make a dataframe of just the small areas, e.g. using numpy where => df_small_areas
pandas group by ID => df_small_IDs
pandas merge df_small_IDs with the original dataframe on ID, using inner join => df_areas_to_dissolve
pandas merge df_small_IDs with the original dataframe on ID, using left excluding join => df_areas_to_not_dissolve
geopandas.dissolve df_areas_to_dissolve
pandas concat df_areas_to_dissolve with df_areas_to_not_dissolve

umbe1987 umbe1987 3,8453 gold badges29 silver badges61 bronze badges · Accepted Answer · 2021-03-02 10:05:52Z

I sincerely thank @BERA and @MikeHoney for taking the time to answer. I was looking for a sort of on-line solution, but apparently dissolve can't be used with a filter.

I accepted @MikeHoney's answer because that is basically what I had to do eventually, but I'd like to show you a reproducible code to achieve what I needed, so that hopefully anybody can benefit from it. I tried to solve the problem as concise as possible (3 lines of code).

from shapely.geometry import Polygon
import geopandas as gpd
import random
lat = list(range(6))
lon = lat
fid = list(range(1, 4))*2
gdf = gpd.GeoDataFrame()
gdf['ID'] = fid
gdf['lat'] = lat
gdf['lon'] = lon
geoms = []
random.seed(12)
for index, row in gdf.iterrows():
 
 dim = round(random.uniform(0.1, 1.0), 10) # define the length of the side of the square (random float between 0.1 and 1.0)
 ln = row.lon
 lt = row.lat
 geom = Polygon([(ln, lt), ((ln + dim), lt), ((ln + dim), (lt - dim)), (ln, (lt - dim))])
 geoms.append(geom)
gdf['geometry'] = geoms
gdf['area'] = gdf.geometry.area
columns = ['ID', 'area', 'geometry']
gdf = gdf[columns]
gdf.sort_values(by='ID', inplace=True)
# THESE THREE LINES DID THE TRICK!
small_area_ids = gdf.loc[gdf.geometry.area < 0.1]['ID']
dissolved = gdf.loc[gdf['ID'].isin(small_area_ids)].dissolve('ID', aggfunc='sum').reset_index()
gdf = gdf.loc[~gdf['ID'].isin(small_area_ids), :].append(dissolved)

Nice work - glad you could sort this. My style is to use step-by-step gdf / df objects, so I can examine them when debugging. But I know that's not to everyone's taste.

Stack Exchange Network

GeoPandas dissolve using an attribute filter

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

GeoPandas dissolve using an attribute filter

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions