I have a shapefile with many large polygons and would like to process each polygon individually because spatial operations on the entire dataset are too big for memory. For instance, iterate shapefile, buffer each polygon, calculate zonal statistics, and store results as a single geodataframe.
Can we iterate through the geopandas dataframe to buffer each polygon separately?
My initial code doesn't appear to update the geodataframe's area after buffering.
import geopandas as gpd
#import rasterio
fp = r"E:\Polygon_Features.shp"
data = gpd.read_file(fp)
print(data.area)
data_buffer = data.copy()
for index, row in data_buffer.iterrows():
row['geometry'] = row['geometry'].buffer(500)
print(data_buffer.area)
1 Answer 1
Look at the answer to Updating value in iterrow for pandas:
The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe
A simpler way would be:
df.geometry = df.geometry.buffer(100)
For zonal statistics try rasterstats:
from rasterstats import zonal_stats
import geopandas as gpd
import pandas as pd
dem = '/folder/dem.tif'
polygons = '/folder/polygons.shp'
df = gpd.read_file(polygons)
stats = pd.DataFrame(zonal_stats(vectors=df['geometry'].buffer(1000), raster=dem))
df = pd.concat([df,stats], axis=1)
df.drop('geometry', axis=1)
id sometext count max mean min
0 4 sdaasd 2548 114.134956 93.990887 77.059998
1 5 dffggfd 2455 114.134956 89.212946 77.059998
2 2 jhhgjgh 2414 110.125275 84.960471 76.599998
3 3 nbmnb 2321 108.325272 82.390840 76.599998
4 1 ytuyut 2275 104.621407 81.760467 76.599998