4

I have a source raster data set (tif file) and a number of polygon features. For each one of those polygons I need to:

a) 'clip' the raster.
b) change values of those clipped raster cells (with the average value).
c) write new values back to the source raster.

I've managed to do a and b, which means I now have a numpy.ndarray variable where all cells have been populated with the average value. How can I write this back to the source raster?

I believe I'd be able to write a script which could do such a thing. I have arrays dimensions (width and height) and top left cell's position, so I guess it'd just be a matter of operating with these arrays. However, I was wondering whether there's any existing function that does that already.

My code:

import os, sys, datetime, time
import geopandas as gpd
import gdal
import rasterio
from rasterio.mask import mask
from fiona.crs import from_epsg
import numpy as np
import pycrs
alr_path = Path to tif file
gdb_vml_polygs = Path to file gdb
fc_Building_footprints = r'Buildings_Footprints'
alr = rasterio.open(alr_path)
vml_polygs = gpd.read_file(gdb_vml_polygs, driver='FileGDB', layer=fc_Building_footprints)
for index, row in vml_polygs.iterrows():
 #row[3] contains the geometry
 window, out_transform = mask(alr, row[3], all_touched=True, crop=True)
 if np.all([window < 0]):
 avg = -1
 else:
 avg = window[window != -1].mean().item()
 window_avg = (np.where(window!=-1, int(round(avg,0)), window))

In the script window_avg is a numpy array containing the new values. How can I write these values back into the source raster?

asked Jul 18, 2019 at 13:37
3
  • 1
    gis.stackexchange.com/a/327467/86131 Commented Jul 18, 2019 at 13:49
  • I can see WriteArray (to be used with gdal.band type of data) does what I'm looking for. Any idea whether there's anything similar for raster data opened with 'rasterio'? The reason I'm asking is because the processing done so far is with rasterio (like the use of 'mask' for example). If I open the raster with gdal instead I guess I would need to find a 'gdal' way to do the mask... Commented Jul 18, 2019 at 14:42
  • I haven't used rasterio but as far as I know, it runs on top of gdal. That means that there is a chance their .write() method accepts an offset as well. You could take a look at the documentation or tweak around with the .write() method to see if its possible. Commented Jul 18, 2019 at 14:53

3 Answers 3

4

I would recommend using rioxarray.

import numpy
import rioxarray
from shapely.geometry import mapping
xds = rioxarray.open_rasterio("NI_RIVER_ALR_1000.tif").sel(band=1).squeeze()
gdb_vml_polygs = 'RSA_VML_BUILDINGS_NI.gdb'
fc_vml_polygs = 'VML_Buildings_NI'
vml_polygs = gpd.read_file(gdb_vml_polygs, driver='FileGDB', layer=fc_vml_polygs)
out_xds = xds.copy()
for geom in vml_polygs.geometry.apply(mapping):
 clipped = xds.rio.clip([geom], vml_polygs.crs, drop=False)
 out_xds = out_xds.where(numpy.isnan(clipped), clipped.mean())
out_xds.rio.to_raster("out_averages.tif")

I tested it on this input raster:

enter image description here

And with four regions to average over, this was the result:

enter image description here

answered Jul 26, 2019 at 3:52
3

With rasterio

filled=False gives back masked array which can be utilised to replace with mean value.

Make sure to set crop=False and set the indexes to the band you want to replace it with.

import geopandas as gpd
import rasterio.mask
import rasterio as rio
gdf = gpd.read_file('input_shp.shp')
with rio.open("input.tif", "r") as src:
 band_num = 1
 src_image = src.read(band_num, out_dtype=rio.float64)
 # for each feature geom, replace with mean
 for geom in gdf.geometry:
 out_image, out_transform = rio.mask.mask(
 src, [geom], crop=False, filled=False, indexes=band_num
 )
 src_image[~out_image.mask] = src_image[~out_image.mask].mean()
 
 # save tif
 profile = src.profile
 profile.update(dtype=rio.float64, count=1, compress="lzw")
 with rasterio.open("output.tif", "w", **profile) as dst:
 dst.write(src_image, 1)
answered Apr 13, 2022 at 7:19
0

I recently undertook a very similar task and found that while the rasterio.mask.mask for-loop method worked, it was incredibly slow. Instead I suggest using rasterstats to populate a list of replacement values for each geometry which you want to burn into the raster, and then using rasterio.features.rasterize to burn the values in. This led to a 1000 improvement in performance for my application.

import geopandas as gpd
import rasterio
import rasterstats
raster = `` # this is a string filepath to a raster
output_raster = `` # this is a string filepath to the output raster
geoms = gpd.read_file('') # Insert a filepath to the vector file you want to use
values = rasterstats.zonal_stats( geoms, dem, stats = 'min' ) # Choose whatever stats you want to use
geoms['replacement_value'] = [i['min'] for i in values] # Replace 'min' as needed 
geoms_zip = zip( geoms.geometry.values, geoms.replacement_value.value) # Create an iterable for rasterio.features.rasterize shape variable
with rasterio.open(raster, "r") as src:
 band_num = 1
 src_image = src.read( band_num )
 raster_replace = rasterio.features.rasterize( geoms_zip, out = src_image,
 transform = src.transform,
 all_touched = True )
 # save tif
 profile = src.profile
 profile.update( dtype= src.meta['dtype'], count = 1, compress = "lzw" )
 with rasterio.open( output_raster, "w", **profile ) as dst:
 dst.write( raster_replace, 1 )
 
 
answered May 9 at 16:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.