Rasterize and sum polygons in Python

Question 1

I have the following shapefile of NYC with all 5 Boroughs as separate polygons, called Boroughs.shp. Data Source (NYC OpenData): https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm

I assign a separate made-up value, test_value to each polygon.

I am trying to take shapefile Boroughs.shp and rasterize each polygon (row) as its own raster layer, where each new raster would each contain a separate and isolated rasterized polygon. I want to create a raster for each borough. This means that the first raster would just be Manhattan rasterized, as if it were the only borough in NYC, and the second raster would just be the Bronx rasterized, as if it were the only borough in NYC, and so on for all of the Boroughs. For each of the 5 rasters produced, I want test_value burned into the pixels.

I accomplish this with the following code using the geocube package:

polygons = gpd.read_file('Boroughs_Test/Boroughs.shp')
polygon_IDs = polygons['ID'].tolist()
for i in polygon_IDs:
 x = polygons.loc[polygons['ID'] == i]
 vector_fn = x
 out_grid = make_geocube(
 vector_data=vector_fn,
 measurements=["test_value"],
 resolution=(-25, 25),
 fill=-9999,
 )
 out_grid["test_value"].rio.to_raster(str(i) + "_Output_Raster.tif")

This works fine, and I have unique, numbered, rasters produced for each borough polygon with the test_value value burned in. Now I have 5 new rasters in my directory folder. What I want to do now, which is what I am having trouble with, is simply summing these rasters into a single summed output raster.

I have tried summing with the following code suggested in this post: Summing all rasters in folder using Python

Here is the code:


# Create an initial array
with rasterio.open(all_files[0]) as src:
 result_array = src.read()
 result_profile = src.profile 
# Add on the rest one at a time
for f in all_files[1:]:
 with rasterio.open(f) as src:
 # Only sum the arrays if the profiles match. 
 assert result_profile == src.profile, 'stopping, file {} and {} do not have matching profiles'.format(all_files[0], f)
 result_array = result_array + src.read()
 
with rasterio.open('Result_raster_NYC.tif', 'w', **result_profile) as dst:
 dst.write(result_array, indexes=[1])

and I get the following error traceback:

---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-6-8cd83849a0a8> in <module>()
 10 with rasterio.open(f) as src:
 11 # Only sum the arrays if the profiles match.
---> 12 assert result_profile == src.profile, 'stopping, file {} and {} do not have matching profiles'.format(all_files[0], f)
 13 result_array = result_array + src.read()
 14 
AssertionError: stopping, file Boroughs_Test1円_Output_Raster.tif and Boroughs_Test2円_Output_Raster.tif do not have matching profiles

For some reason it seems like my rasters are just not lining up, which I cannot figure out, because they were all created from the same polgons shapefile. How can I fix my raster summing code so that I can properly sum these raster files? I am open to separate summing approaches beyond this as well.

Question 2

But if the polygons don't overlap, what's the point of summing the rasterised versions (since there's no overlap)?

Question 3

Have you seen this example: corteva.github.io/geocube/stable/examples/zonal_statistics.html ?

Question 4

@alphabetasoup - While yes, you are correct, summing my rasters in this example, with my single value pixels, and no data regions, would ultimately lead to a raster of NYC that is back where I started, e.g. the values remain the same. However, I am using this as an example so that I can have a script where I can insert polygons that do overlap. As far as raster summing, I will have other raster files that will actually have overlapping pixel values that need to be summed. I felt that this example would be the simplest way to get a working script going.

Question 5

@snowman2 - Yes I saw that example, and I will surely be looking back to it, since I do need to use zonal statistics for my project. Does geocube have a way that I could set all of my produced rasters to the same extent? I want to set all of my individual polygon rasters to the same extent/bounding box as the initial input NYC shapefile, with all polygons intact. The extent of the shapefile is: 563069.6710568518610671,4483098.0464451070874929 : 609762.2528290073387325,4529951.9860328584909439, but I do not know how to enter this into geocube. Would this be the "geom" parameter?

Question 6

Sum operations on rasters require that the input data share many properties, including having the same CRS, same resolution, and (crucially) the same extent. Because the rasters you mean to sum are covering each of the five boroughs, they inherently have different extents.

Question 7

Good point, yes I see that mismatched extents would be an issue, and why of course, differently shaped polygons would have different extents. My goal would be to have all of the rasters have the same extent, which would be the extent of the NYC shapefile, with all borough polygons intact. Do you know how I can set geocube to make all rasters produced share the same extent? I looked at the properties of my NYC shapefile and see that the extent is 563069.6710568518610671,4483098.0464451070874929 : 609762.2528290073387325,4529951.9860328584909439, but I do not know where to enter this.

Question 8

The make_geocube supports a geom parameter:

geom (str, optional) – A GeoJSON string for the bounding box of the data used to construct the grid. It defaults to EPSG:4326 if a CRS is not provided.

Example of adding CRS:

{"type": "Polygon", "crs": {"properties": {"name": "EPSG:3857"}}}

The geom param is not very smart, it can't use geometry directly, nor geometry.__geointerface__ etc. and requires a single feature, not a featurecollection as a string, not a GeoJSON dict.

import geopandas as gpd
from geocube.api.core import make_geocube
from shapely.geometry import box
import json
polygons = gpd.read_file('/path/to/input.shp')
polygon_IDs = polygons['ID'].tolist()
# Make a GeoJSON string of the bounding box feature
bbox = gpd.GeoSeries(box(*polygons.total_bounds), crs=polygons.crs)
geom = bbox.__geo_interface__["features"][0]["geometry"]
# Add CRS
geom["crs"] = {"properties": {"name": f"EPSG:{polygons.crs.to_epsg()}"}}
cubes = []
for i in polygon_IDs:
 x = polygons.loc[polygons['ID'] == i]
 vector_fn = x
 out_grid = make_geocube(
 vector_data=vector_fn,
 measurements=["test_value"],
 resolution=(-25, 25),
 fill=0,
 geom=json.dumps(geom)
 )
 cubes.append(out_grid)
out_grid = sum(cubes)
out_grid["test_value"].rio.to_raster(f"/path/to/Output_Raster.tif", dtype=out_grid["test_value"].dtype)

Note I had to add dtype=out_grid["test_value"].dtype in rio.to_raster to avoid a TypeError crash, not sure why.

Question 9

Does this perform any attribute summation?

Question 10

Nope, question is not actually about summing, it's about getting the input rasters into the same extent so they can be summed and the OP already has code to sum.

Question 11

@user2856 - This looks very promising. Though I tried it out and received this error: RasterioIOError: Attempt to create new tiff file '/tmp/Output_Raster.tif' failed: No error, and I am not sure what the error actually is if it says "No error". I used the same shapefile from before. And yes, my question here was about getting the rasters into the same extent so they can be summed. But once that works, I will try the newly produced same-extent rasters with my summing code to see if they can be successfully summed.

Question 12

Just change your output path to a location you can write to. Lesson: don't just copy/paste/run code from StackExchange!

Question 13

@LostinSpatialAnalysis did this work?

Question 14

I've made some changes to the accepted answer in order to consume less memory since it broke for me when trying to run it at 0.1o resolution

import geopandas as gpd
from geocube.api.core import make_geocube
from shapely.geometry import box
import json
polygons = gpd.read_file('Shapefile.zip')
polygon_IDs = polygons['ID'].tolist()
len(polygon_IDs)
# Make a GeoJSON string of the bounding box feature
bbox = gpd.GeoSeries(box(*polygons.total_bounds), crs=polygons.crs)
geom = bbox.__geo_interface__["features"][0]["geometry"]
# Add CRS
geom["crs"] = {"properties": {"name": f"EPSG:4326"}}
out_grid = None
for i in polygon_IDs:
 cubes = []
 print(i)
 x = polygons.loc[polygons['ID'] == i]
 vector_fn = x
 partial_out_grid = make_geocube(
 vector_data=vector_fn,
 measurements=["feature_name"],
 resolution=(-0.1, 0.1),
 fill=0,
 geom=json.dumps(geom)
 )
 cubes.append(partial_out_grid)
 if out_grid is not None:
 cubes.append(out_grid)
 out_grid = sum(cubes)
out_grid["feature_name"].rio.to_raster(f"Output_Raster_0_1.tif", dtype=out_grid["feature_name"].dtype)

Kartograaf Kartograaf 3,1999 silver badges24 bronze badges · Accepted Answer · 2021-12-21 23:04:47Z

3

Sum operations on rasters require that the input data share many properties, including having the same CRS, same resolution, and (crucially) the same extent. Because the rasters you mean to sum are covering each of the five boroughs, they inherently have different extents.

Share

Improve this answer

answered Dec 21, 2021 at 23:04

Kartograaf's user avatar

Kartograaf Kartograaf

3,1999 silver badges24 bronze badges

1

Good point, yes I see that mismatched extents would be an issue, and why of course, differently shaped polygons would have different extents. My goal would be to have all of the rasters have the same extent, which would be the extent of the NYC shapefile, with all borough polygons intact. Do you know how I can set geocube to make all rasters produced share the same extent? I looked at the properties of my NYC shapefile and see that the extent is 563069.6710568518610671,4483098.0464451070874929 : 609762.2528290073387325,4529951.9860328584909439, but I do not know where to enter this.

LostinSpatialAnalysis
– LostinSpatialAnalysis

2021年12月22日 18:48:05 +00:00
Commented Dec 22, 2021 at 18:48

Add a comment |

Stack Exchange Network

Rasterize and sum polygons in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Rasterize and sum polygons in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions