Is there a way to easily count overlapping polygons using Geopandas, the same way as the ArcGIS Pro Count Overlapping Features works?
So far my approach was to do union overlay and then dissolve with aggfunc='count'
but for some reason the results I get are not correct.
I have 3 overlapping polygons in a single geodataframe:
enter image description here enter image description here
Then I do the overlay:
union = gpd.overlay(demo_pg, demo_pg, how='union')
As a result I get only 9 polygons, although I should get 10 (this is what union
in QGIS or ArcGIS would return):
Is there anything wrong with my approach? What is the best way to count overlapping polygons in a single geodataframe?
EDIT: Full code is below. It returns 9 polygons. Based on my understanding on union
/intersect
operations, it should result in 10 polygons. The intersection of 3 polygons is counted only twice, not three times... The union operation in QGIS for the same set of polygons results in 10 polygons.
import pandas as pd
import matplotlib as plt
import geopandas as gpd
from shapely import wkt
data = {'name': ['polygon A', 'polygon B', 'polygon C'],
'id': [1, 2, 3],
'geom': ['MULTIPOLYGON (((36.00000 11.00000, 36.00000 12.00000, 37.00000 12.00000, 37.00000 11.00000, 36.00000 11.00000)))', 'MULTIPOLYGON (((36.50000 11.50000, 37.50000 11.50000, 37.50000 11.00000, 36.50000 11.00000, 36.50000 11.50000)))', 'MULTIPOLYGON (((36.61799 10.80580, 36.61570 11.19321, 36.86327 11.29637, 37.34925 10.91813, 37.00540 10.71182, 36.61799 10.80580)))']
}
df = pd.DataFrame (data, columns = ['name','id','geom'])
df['geom'] = df['geom'].apply(wkt.loads)
gdf = gpd.GeoDataFrame(df, geometry='geom')
demo_pg.plot(alpha=0.5, column='id')
union = gpd.overlay(demo_pg, demo_pg, how='union')
len(union)
-
Could you, please, provide your sample code so that others can use it to play around?StefanBrand_EOX– StefanBrand_EOX2021年02月19日 08:40:50 +00:00Commented Feb 19, 2021 at 8:40
-
1@Stefan - I've included a full code in the main postjan_b– jan_b2021年02月20日 19:39:52 +00:00Commented Feb 20, 2021 at 19:39
6 Answers 6
It took quite a bit of head scratching, but I finally got there! Here is how you can do it in GeoPandas
.
def count_overlapping_features(in_gdf):
# Get the name of the column containing the geometries
geom_col = in_gdf.geometry.name
# Setting up a single piece that will be split later
input_parts = [in_gdf.unary_union.buffer(0)]
# Finding all the "cutting" boundaries. Note: if the input GDF has
# MultiPolygons, it will treat each of the geometry's parts as individual
# pieces.
cutting_boundaries = []
for i, row in in_gdf.iterrows():
this_row_geom = row[geom_col]
this_row_boundary = this_row_geom.boundary
if this_row_boundary.type[:len('multi')].lower() == 'multi':
cutting_boundaries = cutting_boundaries + list(this_row_boundary.geoms)
else:
cutting_boundaries.append(this_row_boundary)
# Split the big input geometry using each and every cutting boundary
for boundary in cutting_boundaries:
splitting_results = []
for j,part in enumerate(input_parts):
new_parts = list(shapely.ops.split(part, boundary).geoms)
splitting_results = splitting_results + new_parts
input_parts = splitting_results
# After generating all of the split pieces, create a new GeoDataFrame
new_gdf = gpd.GeoDataFrame({'id':range(len(splitting_results)),
geom_col:splitting_results,
},
crs=in_gdf.crs,
geometry=geom_col)
# Find the new centroids.
new_gdf['geom_centroid'] = new_gdf.centroid
# Starting the count at zero
new_gdf['count_intersections'] = 0
# For each of the `new_gdf`'s rows, find how many overlapping features
# there are from the input GDF.
for i,row in new_gdf.iterrows():
new_gdf.loc[i,'count_intersections'] = in_gdf.intersects(row['geom_centroid']).astype(int).sum()
pass
# Dropping the column containing the centroids
new_gdf = new_gdf.drop(columns=['geom_centroid'])[['id','count_intersections',geom_col]]
return new_gdf
This function takes in a GeoDataFrame and spits out a GeoDataFrame that contains all of the "unique" pieces of the original input. It also has a column called "count_intersections" that indicates how many of the features of the original input they intersect with.
When I apply the function to your input data, this is what I get:
# Applying the function to the input data
out_df = count_overlapping_features(gdf)
out_df.plot(alpha=0.5, column='count_intersections')
output generated by function above inspecting the new GeoDataFrame
Caveat
This function works just fine with input GeoDataFrames
that contain POLYGON
features. However, if you have MULTIPOLYGON
features, things might get weird. My code can handle a few corner cases, but there are definitely A LOT more that I didn't foresee. So investigate the results VERY CLOSELY if you're using this function on MULTIPOLYGON
features.
-
1side question, what programme are you using to inspect the
GeoDataframe
?sutan– sutan2022年04月06日 14:03:45 +00:00Commented Apr 6, 2022 at 14:03 -
I'm using Spyder IDE. In it, there's a "variable explorer" section. You can double click on DataFrames and GeoDataFrames and they pop out in a new window as the one in the last snapshot in my post. If you're using Anaconda to manage your Python libraries, you can install spyder by simply typing
conda install spyder
in your terminal =)Felipe D.– Felipe D.2022年04月06日 14:55:43 +00:00Commented Apr 6, 2022 at 14:55
Figured out how to do this efficiently and without a ton of looping:
def count_overlapping_features(gdf):
bounds = gdf.geometry.convex_hull.exterior.buffer(1e6).unary_union
new_polys = list(shapely.ops.polygonize(bounds))
#Removing the full merged polygon (always index 0)
new_polys.remove(new_polys[0])
new_gdf = gpd.GeoDataFrame(geometry=new_polys)
new_gdf['overlaps'] = sum([new_gdf.geometry.intersects(poly) for poly in gdf.geometry.values])
return new_gdf
This won't work with Multipolygons that aren't valid polygons.
-
.buffer(1e6).unary_union
why are you buffering this?sutan– sutan2022年05月26日 15:16:57 +00:00Commented May 26, 2022 at 15:16
A better solution based on Parker Devine's answer
def count_overlapping_features(gdf):
#generating all of the split pieces
import shapely
bounds = gdf.geometry.exterior.unary_union
new_polys = list(shapely.ops.polygonize(bounds))
new_gdf = gpd.GeoDataFrame(geometry=new_polys)
new_gdf['id'] = range(len(new_gdf))
#count overlapping by sjoin between pieces representative point (point inside polygon) and the input gdf
new_gdf_centroid = new_gdf.copy()
new_gdf_centroid['geometry'] = new_gdf.geometry.representative_point()
overlapcount = gpd.sjoin(new_gdf_centroid,gdf)
overlapcount = overlapcount.groupby(['id'])['index_right'].count().rename('count').reset_index()
out_gdf = pd.merge(new_gdf,overlapcount)
return out_gdf
-
1Change
new_gdf = gpd.GeoDataFrame(geometry=new_polys)
tonew_gdf = gpd.GeoDataFrame(geometry=new_polys, crs=gdf.crs)
to preserve projection and avoid warnings about crs mismatch during the join/mergeOlsgaard– Olsgaard2022年06月16日 06:46:07 +00:00Commented Jun 16, 2022 at 6:46
I think ni101 is right, I made it more concise with some comments. The idea:
- create a unary union out of the
gdf
exterior - polygonise the unioned exterior
- for each polygon, count how many intersection it has on the initial
gdf
def count_overlap(gdf):
# assuming gdf is polygon and not multipolygon
# dissolve using unary union, then polygonize the exterior of the dissolved features
import geopandas as gpd
from shapely.ops import polygonize
# create polygon array
exterior_geom = list(polygonize(gdf.exterior.unary_union))
# create gdf out of the dissolved features
gdf_exterior = gpd.GeoDataFrame({'id':range(0, len(exterior_geom))}, geometry=exterior_geom, crs=gdf.crs).explode().reset_index(drop=True)
gdf_exterior['id'] = gdf_exterior.index
# count the intersection of the polygonised unary_union, and the initial gdf
# the problem with intersection is that it counts if it touches
gdf_exterior['intersection'] = [len(gdf[gdf.geometry.intersects(feature)]) for feature in gdf_exterior['geometry']]
gdf_exterior['touching'] = [len(gdf[gdf.geometry.touches(feature)]) for feature in gdf_exterior['geometry']]
# so the real count must substract polygons that touches. it's cumbersome but, oh well.
gdf_exterior['count'] = gdf_exterior['intersection'] - gdf_exterior['touching']
return gdf_exterior
-
When I apply this to the data from the OP, it says the output dataframe is empty.user3433489– user34334892025年03月13日 18:53:45 +00:00Commented Mar 13 at 18:53
You can:
Unary union all polygon boundaries
Polygonize and create a new df
Intersect with input df
Groupby geometry area, centroid coordinates
import geopandas as gpd
import shapely
df = gpd.read_file(r"/home/bera/Desktop/GIStest/overlapping_polygons.shp")
union = shapely.ops.unary_union(df.boundary) #Create one multiline of all polygon boundaries
#Polygonize it and create a dataframe
polygonized = list(shapely.ops.polygonize(union))
df2 = gpd.GeoDataFrame(geometry=polygonized, crs=df.crs)
#Intersect the input df with the polygonized df.
#Duplicate geometries till be created Where multiple polygons overlap
intersected = gpd.overlay(df1=df, df2=df2, how="intersection", keep_geom_type=True)
def area_and_centroid_coords(g):
"""returns a string of polygon area and centroid coordinates"""
#For example: 2954918.3 761397.9 7093049.7
area = round(g.area, 1)
centroid = g.centroid
x = round(centroid.x, 1)
y = round(centroid.y, 1)
value = f"{area} {x} {y}"
return value
#Create a unique string to groupby
intersected["geometry_distinction"] = intersected.apply(
lambda x: area_and_centroid_coords(x.geometry), axis=1)
intersected["n_overlaps"] = intersected.groupby("geometry_distinction",
as_index=False)["geometry_distinction"].transform("count")
intersected.plot(column="n_overlaps", categorical=True, legend=True, cmap="rainbow", figsize=(15, 15))
Want to note that the solutions above won't work for a gdf with multiple isolated clusters of polygons. I've adapted Parker Devine's code to do so, by removing not only the first "full merged polygon" but identifying subsequent bunches' full merged polygons by noticing that they're disjoint from the prior one.
def count_overlap(gdf):
import shapely
bounds = gdf.geometry.convex_hull.exterior.buffer(SMALL_NUMBER).unary_union
new_polys = list(shapely.ops.polygonize(bounds))
# Removing the full merged polygons (first is always index 0,
# subsequent will be the first of their own 'bunches' identified as disjoint from other 'bunches')
bad_poly_idx = [0]
while new_polys[max(bad_poly_idx)].disjoint(new_polys[-1]):
for idx in range(max(bad_poly_idx), len(new_polys)):
if new_polys[max(bad_poly_idx)].disjoint(new_polys[idx]):
bad_poly_idx += [idx]
break
new_polys = [new_polys[i].buffer(-SMALL_NUMBER) for i in range(len(new_polys)) if i not in bad_poly_idx]
# count layers
new_gdf = gpd.GeoDataFrame(geometry=new_polys)
new_gdf['layers'] = sum([new_gdf.geometry.intersects(poly) for poly in gdf.geometry.buffer(SMALL_NUMBER).values])
new_gdf['piece'] = new_gdf.index
return new_gdf
Explore related questions
See similar questions with these tags.