Count overlapping features using Geopandas

Question 1

Is there a way to easily count overlapping polygons using Geopandas, the same way as the ArcGIS Pro Count Overlapping Features works?

enter image description here

So far my approach was to do union overlay and then dissolve with aggfunc='count' but for some reason the results I get are not correct.

I have 3 overlapping polygons in a single geodataframe:

enter image description here enter image description here

Then I do the overlay:

union = gpd.overlay(demo_pg, demo_pg, how='union')

As a result I get only 9 polygons, although I should get 10 (this is what union in QGIS or ArcGIS would return):

enter image description here

Is there anything wrong with my approach? What is the best way to count overlapping polygons in a single geodataframe?

EDIT: Full code is below. It returns 9 polygons. Based on my understanding on union/intersect operations, it should result in 10 polygons. The intersection of 3 polygons is counted only twice, not three times... The union operation in QGIS for the same set of polygons results in 10 polygons.

import pandas as pd
import matplotlib as plt
import geopandas as gpd
from shapely import wkt
data = {'name': ['polygon A', 'polygon B', 'polygon C'],
 'id': [1, 2, 3],
 'geom': ['MULTIPOLYGON (((36.00000 11.00000, 36.00000 12.00000, 37.00000 12.00000, 37.00000 11.00000, 36.00000 11.00000)))', 'MULTIPOLYGON (((36.50000 11.50000, 37.50000 11.50000, 37.50000 11.00000, 36.50000 11.00000, 36.50000 11.50000)))', 'MULTIPOLYGON (((36.61799 10.80580, 36.61570 11.19321, 36.86327 11.29637, 37.34925 10.91813, 37.00540 10.71182, 36.61799 10.80580)))']
 }
df = pd.DataFrame (data, columns = ['name','id','geom'])
df['geom'] = df['geom'].apply(wkt.loads)
gdf = gpd.GeoDataFrame(df, geometry='geom')
demo_pg.plot(alpha=0.5, column='id')
union = gpd.overlay(demo_pg, demo_pg, how='union')
len(union)

Question 2

Could you, please, provide your sample code so that others can use it to play around?

Question 3

@Stefan - I've included a full code in the main post

Question 4

It took quite a bit of head scratching, but I finally got there! Here is how you can do it in GeoPandas.

def count_overlapping_features(in_gdf):
 # Get the name of the column containing the geometries
 geom_col = in_gdf.geometry.name
 
 # Setting up a single piece that will be split later
 input_parts = [in_gdf.unary_union.buffer(0)]
 
 # Finding all the "cutting" boundaries. Note: if the input GDF has 
 # MultiPolygons, it will treat each of the geometry's parts as individual
 # pieces.
 cutting_boundaries = []
 for i, row in in_gdf.iterrows():
 this_row_geom = row[geom_col]
 this_row_boundary = this_row_geom.boundary
 if this_row_boundary.type[:len('multi')].lower() == 'multi':
 cutting_boundaries = cutting_boundaries + list(this_row_boundary.geoms)
 else:
 cutting_boundaries.append(this_row_boundary)
 
 
 # Split the big input geometry using each and every cutting boundary
 for boundary in cutting_boundaries:
 splitting_results = []
 for j,part in enumerate(input_parts):
 new_parts = list(shapely.ops.split(part, boundary).geoms)
 splitting_results = splitting_results + new_parts
 input_parts = splitting_results
 
 # After generating all of the split pieces, create a new GeoDataFrame
 new_gdf = gpd.GeoDataFrame({'id':range(len(splitting_results)),
 geom_col:splitting_results,
 },
 crs=in_gdf.crs,
 geometry=geom_col)
 
 # Find the new centroids.
 new_gdf['geom_centroid'] = new_gdf.centroid
 
 # Starting the count at zero
 new_gdf['count_intersections'] = 0
 
 # For each of the `new_gdf`'s rows, find how many overlapping features 
 # there are from the input GDF.
 for i,row in new_gdf.iterrows():
 new_gdf.loc[i,'count_intersections'] = in_gdf.intersects(row['geom_centroid']).astype(int).sum()
 pass
 
 # Dropping the column containing the centroids
 new_gdf = new_gdf.drop(columns=['geom_centroid'])[['id','count_intersections',geom_col]]
 
 return new_gdf

This function takes in a GeoDataFrame and spits out a GeoDataFrame that contains all of the "unique" pieces of the original input. It also has a column called "count_intersections" that indicates how many of the features of the original input they intersect with.

When I apply the function to your input data, this is what I get:

# Applying the function to the input data
out_df = count_overlapping_features(gdf)
out_df.plot(alpha=0.5, column='count_intersections')

output generated by function above inspecting the new GeoDataFrame

Caveat

This function works just fine with input GeoDataFrames that contain POLYGON features. However, if you have MULTIPOLYGON features, things might get weird. My code can handle a few corner cases, but there are definitely A LOT more that I didn't foresee. So investigate the results VERY CLOSELY if you're using this function on MULTIPOLYGON features.

Question 5

side question, what programme are you using to inspect the GeoDataframe?

Question 6

I'm using Spyder IDE. In it, there's a "variable explorer" section. You can double click on DataFrames and GeoDataFrames and they pop out in a new window as the one in the last snapshot in my post. If you're using Anaconda to manage your Python libraries, you can install spyder by simply typing conda install spyder in your terminal =)

Question 7

Figured out how to do this efficiently and without a ton of looping:

def count_overlapping_features(gdf):
 bounds = gdf.geometry.convex_hull.exterior.buffer(1e6).unary_union
 new_polys = list(shapely.ops.polygonize(bounds))
 #Removing the full merged polygon (always index 0)
 new_polys.remove(new_polys[0])
 new_gdf = gpd.GeoDataFrame(geometry=new_polys)
 new_gdf['overlaps'] = sum([new_gdf.geometry.intersects(poly) for poly in gdf.geometry.values])
 return new_gdf

This won't work with Multipolygons that aren't valid polygons.

enter image description here

Question 8

.buffer(1e6).unary_union why are you buffering this?

Question 9

A better solution based on Parker Devine's answer

def count_overlapping_features(gdf):
 #generating all of the split pieces
 import shapely
 bounds = gdf.geometry.exterior.unary_union
 new_polys = list(shapely.ops.polygonize(bounds))
 new_gdf = gpd.GeoDataFrame(geometry=new_polys)
 new_gdf['id'] = range(len(new_gdf))
 #count overlapping by sjoin between pieces representative point (point inside polygon) and the input gdf 
 new_gdf_centroid = new_gdf.copy()
 new_gdf_centroid['geometry'] = new_gdf.geometry.representative_point()
 overlapcount = gpd.sjoin(new_gdf_centroid,gdf)
 overlapcount = overlapcount.groupby(['id'])['index_right'].count().rename('count').reset_index()
 out_gdf = pd.merge(new_gdf,overlapcount)
 return out_gdf

Question 10

Change new_gdf = gpd.GeoDataFrame(geometry=new_polys) to new_gdf = gpd.GeoDataFrame(geometry=new_polys, crs=gdf.crs) to preserve projection and avoid warnings about crs mismatch during the join/merge

Question 11

I think ni101 is right, I made it more concise with some comments. The idea:

create a unary union out of the gdf exterior
polygonise the unioned exterior
for each polygon, count how many intersection it has on the initial gdf

def count_overlap(gdf):
 # assuming gdf is polygon and not multipolygon
 # dissolve using unary union, then polygonize the exterior of the dissolved features
 import geopandas as gpd
 from shapely.ops import polygonize
 
 # create polygon array
 exterior_geom = list(polygonize(gdf.exterior.unary_union))
 # create gdf out of the dissolved features
 gdf_exterior = gpd.GeoDataFrame({'id':range(0, len(exterior_geom))}, geometry=exterior_geom, crs=gdf.crs).explode().reset_index(drop=True)
 gdf_exterior['id'] = gdf_exterior.index
 # count the intersection of the polygonised unary_union, and the initial gdf
 # the problem with intersection is that it counts if it touches
 gdf_exterior['intersection'] = [len(gdf[gdf.geometry.intersects(feature)]) for feature in gdf_exterior['geometry']]
 gdf_exterior['touching'] = [len(gdf[gdf.geometry.touches(feature)]) for feature in gdf_exterior['geometry']]
 # so the real count must substract polygons that touches. it's cumbersome but, oh well.
 gdf_exterior['count'] = gdf_exterior['intersection'] - gdf_exterior['touching']
 
 return gdf_exterior

Question 12

When I apply this to the data from the OP, it says the output dataframe is empty.

Question 13

You can:

Unary union all polygon boundaries

Polygonize and create a new df

Intersect with input df

Groupby geometry area, centroid coordinates

import geopandas as gpd
import shapely
df = gpd.read_file(r"/home/bera/Desktop/GIStest/overlapping_polygons.shp")
union = shapely.ops.unary_union(df.boundary) #Create one multiline of all polygon boundaries
#Polygonize it and create a dataframe
polygonized = list(shapely.ops.polygonize(union))
df2 = gpd.GeoDataFrame(geometry=polygonized, crs=df.crs)
#Intersect the input df with the polygonized df. 
#Duplicate geometries till be created Where multiple polygons overlap
intersected = gpd.overlay(df1=df, df2=df2, how="intersection", keep_geom_type=True)
def area_and_centroid_coords(g):
 """returns a string of polygon area and centroid coordinates"""
 #For example: 2954918.3 761397.9 7093049.7
 area = round(g.area, 1)
 centroid = g.centroid
 x = round(centroid.x, 1)
 y = round(centroid.y, 1)
 value = f"{area} {x} {y}"
 return value
#Create a unique string to groupby
intersected["geometry_distinction"] = intersected.apply(
 lambda x: area_and_centroid_coords(x.geometry), axis=1)
intersected["n_overlaps"] = intersected.groupby("geometry_distinction", 
 as_index=False)["geometry_distinction"].transform("count")
intersected.plot(column="n_overlaps", categorical=True, legend=True, cmap="rainbow", figsize=(15, 15))

enter image description here

Question 14

Want to note that the solutions above won't work for a gdf with multiple isolated clusters of polygons. I've adapted Parker Devine's code to do so, by removing not only the first "full merged polygon" but identifying subsequent bunches' full merged polygons by noticing that they're disjoint from the prior one.

def count_overlap(gdf):
 import shapely
 bounds = gdf.geometry.convex_hull.exterior.buffer(SMALL_NUMBER).unary_union
 new_polys = list(shapely.ops.polygonize(bounds))
 # Removing the full merged polygons (first is always index 0, 
 # subsequent will be the first of their own 'bunches' identified as disjoint from other 'bunches')
 bad_poly_idx = [0]
 while new_polys[max(bad_poly_idx)].disjoint(new_polys[-1]):
 for idx in range(max(bad_poly_idx), len(new_polys)):
 if new_polys[max(bad_poly_idx)].disjoint(new_polys[idx]):
 bad_poly_idx += [idx]
 break
 new_polys = [new_polys[i].buffer(-SMALL_NUMBER) for i in range(len(new_polys)) if i not in bad_poly_idx]
 # count layers
 new_gdf = gpd.GeoDataFrame(geometry=new_polys)
 new_gdf['layers'] = sum([new_gdf.geometry.intersects(poly) for poly in gdf.geometry.buffer(SMALL_NUMBER).values])
 new_gdf['piece'] = new_gdf.index
 return new_gdf

Felipe D. Felipe D. 2,8594 gold badges20 silver badges39 bronze badges · Accepted Answer · 2021-11-15 21:04:27Z

It took quite a bit of head scratching, but I finally got there! Here is how you can do it in GeoPandas.

def count_overlapping_features(in_gdf):
 # Get the name of the column containing the geometries
 geom_col = in_gdf.geometry.name
 
 # Setting up a single piece that will be split later
 input_parts = [in_gdf.unary_union.buffer(0)]
 
 # Finding all the "cutting" boundaries. Note: if the input GDF has 
 # MultiPolygons, it will treat each of the geometry's parts as individual
 # pieces.
 cutting_boundaries = []
 for i, row in in_gdf.iterrows():
 this_row_geom = row[geom_col]
 this_row_boundary = this_row_geom.boundary
 if this_row_boundary.type[:len('multi')].lower() == 'multi':
 cutting_boundaries = cutting_boundaries + list(this_row_boundary.geoms)
 else:
 cutting_boundaries.append(this_row_boundary)
 
 
 # Split the big input geometry using each and every cutting boundary
 for boundary in cutting_boundaries:
 splitting_results = []
 for j,part in enumerate(input_parts):
 new_parts = list(shapely.ops.split(part, boundary).geoms)
 splitting_results = splitting_results + new_parts
 input_parts = splitting_results
 
 # After generating all of the split pieces, create a new GeoDataFrame
 new_gdf = gpd.GeoDataFrame({'id':range(len(splitting_results)),
 geom_col:splitting_results,
 },
 crs=in_gdf.crs,
 geometry=geom_col)
 
 # Find the new centroids.
 new_gdf['geom_centroid'] = new_gdf.centroid
 
 # Starting the count at zero
 new_gdf['count_intersections'] = 0
 
 # For each of the `new_gdf`'s rows, find how many overlapping features 
 # there are from the input GDF.
 for i,row in new_gdf.iterrows():
 new_gdf.loc[i,'count_intersections'] = in_gdf.intersects(row['geom_centroid']).astype(int).sum()
 pass
 
 # Dropping the column containing the centroids
 new_gdf = new_gdf.drop(columns=['geom_centroid'])[['id','count_intersections',geom_col]]
 
 return new_gdf

This function takes in a GeoDataFrame and spits out a GeoDataFrame that contains all of the "unique" pieces of the original input. It also has a column called "count_intersections" that indicates how many of the features of the original input they intersect with.

When I apply the function to your input data, this is what I get:

# Applying the function to the input data
out_df = count_overlapping_features(gdf)
out_df.plot(alpha=0.5, column='count_intersections')

output generated by function above inspecting the new GeoDataFrame

Caveat

This function works just fine with input GeoDataFrames that contain POLYGON features. However, if you have MULTIPOLYGON features, things might get weird. My code can handle a few corner cases, but there are definitely A LOT more that I didn't foresee. So investigate the results VERY CLOSELY if you're using this function on MULTIPOLYGON features.

side question, what programme are you using to inspect the GeoDataframe?
I'm using Spyder IDE. In it, there's a "variable explorer" section. You can double click on DataFrames and GeoDataFrames and they pop out in a new window as the one in the last snapshot in my post. If you're using Anaconda to manage your Python libraries, you can install spyder by simply typing conda install spyder in your terminal =)

Stack Exchange Network

Count overlapping features using Geopandas

6 Answers 6

Caveat

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Count overlapping features using Geopandas

6 Answers 6

Caveat

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions