I have the following shapefile of NYC with all 5 Boroughs as separate polygons, called Boroughs.shp
. Data Source (NYC OpenData): https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm
I assign a separate made-up value, test_value
to each polygon.
I am trying to take shapefile Boroughs.shp and rasterize each polygon (row) as its own raster layer, where each new raster would each contain a separate and isolated rasterized polygon. I want to create a raster for each borough. This means that the first raster would just be Manhattan rasterized, as if it were the only borough in NYC, and the second raster would just be the Bronx rasterized, as if it were the only borough in NYC, and so on for all of the Boroughs. For each of the 5 rasters produced, I want test_value
burned into the pixels.
I accomplish this with the following code using the geocube
package:
polygons = gpd.read_file('Boroughs_Test/Boroughs.shp')
polygon_IDs = polygons['ID'].tolist()
for i in polygon_IDs:
x = polygons.loc[polygons['ID'] == i]
vector_fn = x
out_grid = make_geocube(
vector_data=vector_fn,
measurements=["test_value"],
resolution=(-25, 25),
fill=-9999,
)
out_grid["test_value"].rio.to_raster(str(i) + "_Output_Raster.tif")
This works fine, and I have unique, numbered, rasters produced for each borough polygon with the test_value
value burned in. Now I have 5 new rasters in my directory folder. What I want to do now, which is what I am having trouble with, is simply summing these rasters into a single summed output raster.
I have tried summing with the following code suggested in this post: Summing all rasters in folder using Python
Here is the code:
# Create an initial array
with rasterio.open(all_files[0]) as src:
result_array = src.read()
result_profile = src.profile
# Add on the rest one at a time
for f in all_files[1:]:
with rasterio.open(f) as src:
# Only sum the arrays if the profiles match.
assert result_profile == src.profile, 'stopping, file {} and {} do not have matching profiles'.format(all_files[0], f)
result_array = result_array + src.read()
with rasterio.open('Result_raster_NYC.tif', 'w', **result_profile) as dst:
dst.write(result_array, indexes=[1])
and I get the following error traceback:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-6-8cd83849a0a8> in <module>()
10 with rasterio.open(f) as src:
11 # Only sum the arrays if the profiles match.
---> 12 assert result_profile == src.profile, 'stopping, file {} and {} do not have matching profiles'.format(all_files[0], f)
13 result_array = result_array + src.read()
14
AssertionError: stopping, file Boroughs_Test1円_Output_Raster.tif and Boroughs_Test2円_Output_Raster.tif do not have matching profiles
For some reason it seems like my rasters are just not lining up, which I cannot figure out, because they were all created from the same polgons shapefile. How can I fix my raster summing code so that I can properly sum these raster files? I am open to separate summing approaches beyond this as well.
3 Answers 3
Sum operations on rasters require that the input data share many properties, including having the same CRS, same resolution, and (crucially) the same extent. Because the rasters you mean to sum are covering each of the five boroughs, they inherently have different extents.
-
Good point, yes I see that mismatched extents would be an issue, and why of course, differently shaped polygons would have different extents. My goal would be to have all of the rasters have the same extent, which would be the extent of the NYC shapefile, with all borough polygons intact. Do you know how I can set geocube to make all rasters produced share the same extent? I looked at the properties of my NYC shapefile and see that the extent is
563069.6710568518610671,4483098.0464451070874929 : 609762.2528290073387325,4529951.9860328584909439
, but I do not know where to enter this.LostinSpatialAnalysis– LostinSpatialAnalysis2021年12月22日 18:48:05 +00:00Commented Dec 22, 2021 at 18:48
The make_geocube
supports a geom
parameter:
geom (str, optional) – A GeoJSON string for the bounding box of the data used to construct the grid. It defaults to EPSG:4326 if a CRS is not provided.
Example of adding CRS:
{"type": "Polygon", "crs": {"properties": {"name": "EPSG:3857"}}}
The geom
param is not very smart, it can't use geometry directly, nor geometry.__geointerface__
etc. and requires a single feature, not a featurecollection as a string, not a GeoJSON dict.
import geopandas as gpd
from geocube.api.core import make_geocube
from shapely.geometry import box
import json
polygons = gpd.read_file('/path/to/input.shp')
polygon_IDs = polygons['ID'].tolist()
# Make a GeoJSON string of the bounding box feature
bbox = gpd.GeoSeries(box(*polygons.total_bounds), crs=polygons.crs)
geom = bbox.__geo_interface__["features"][0]["geometry"]
# Add CRS
geom["crs"] = {"properties": {"name": f"EPSG:{polygons.crs.to_epsg()}"}}
cubes = []
for i in polygon_IDs:
x = polygons.loc[polygons['ID'] == i]
vector_fn = x
out_grid = make_geocube(
vector_data=vector_fn,
measurements=["test_value"],
resolution=(-25, 25),
fill=0,
geom=json.dumps(geom)
)
cubes.append(out_grid)
out_grid = sum(cubes)
out_grid["test_value"].rio.to_raster(f"/path/to/Output_Raster.tif", dtype=out_grid["test_value"].dtype)
Note I had to add dtype=out_grid["test_value"].dtype
in rio.to_raster
to avoid a TypeError
crash, not sure why.
-
Does this perform any attribute summation?alphabetasoup– alphabetasoup2021年12月23日 00:42:26 +00:00Commented Dec 23, 2021 at 0:42
-
Nope, question is not actually about summing, it's about getting the input rasters into the same extent so they can be summed and the OP already has code to sum.user2856– user28562021年12月23日 01:33:00 +00:00Commented Dec 23, 2021 at 1:33
-
@user2856 - This looks very promising. Though I tried it out and received this error:
RasterioIOError: Attempt to create new tiff file '/tmp/Output_Raster.tif' failed: No error
, and I am not sure what the error actually is if it says "No error". I used the same shapefile from before. And yes, my question here was about getting the rasters into the same extent so they can be summed. But once that works, I will try the newly produced same-extent rasters with my summing code to see if they can be successfully summed.LostinSpatialAnalysis– LostinSpatialAnalysis2021年12月23日 20:57:17 +00:00Commented Dec 23, 2021 at 20:57 -
Just change your output path to a location you can write to. Lesson: don't just copy/paste/run code from StackExchange!user2856– user28562021年12月23日 22:17:15 +00:00Commented Dec 23, 2021 at 22:17
-
@LostinSpatialAnalysis did this work?user2856– user28562021年12月29日 23:56:13 +00:00Commented Dec 29, 2021 at 23:56
I've made some changes to the accepted answer in order to consume less memory since it broke for me when trying to run it at 0.1o resolution
import geopandas as gpd
from geocube.api.core import make_geocube
from shapely.geometry import box
import json
polygons = gpd.read_file('Shapefile.zip')
polygon_IDs = polygons['ID'].tolist()
len(polygon_IDs)
# Make a GeoJSON string of the bounding box feature
bbox = gpd.GeoSeries(box(*polygons.total_bounds), crs=polygons.crs)
geom = bbox.__geo_interface__["features"][0]["geometry"]
# Add CRS
geom["crs"] = {"properties": {"name": f"EPSG:4326"}}
out_grid = None
for i in polygon_IDs:
cubes = []
print(i)
x = polygons.loc[polygons['ID'] == i]
vector_fn = x
partial_out_grid = make_geocube(
vector_data=vector_fn,
measurements=["feature_name"],
resolution=(-0.1, 0.1),
fill=0,
geom=json.dumps(geom)
)
cubes.append(partial_out_grid)
if out_grid is not None:
cubes.append(out_grid)
out_grid = sum(cubes)
out_grid["feature_name"].rio.to_raster(f"Output_Raster_0_1.tif", dtype=out_grid["feature_name"].dtype)
Explore related questions
See similar questions with these tags.
563069.6710568518610671,4483098.0464451070874929 : 609762.2528290073387325,4529951.9860328584909439
, but I do not know how to enter this into geocube. Would this be the "geom" parameter?