I have two raster files as numpy arrays. I'm combining the rasters into one file. The rasters overlap on several pixel but have different values for the the overlapping pixels (one raster has nodata value for pixels, the other has the actual elevation data point). How do I combine the two arrays such that I only take the actual elevation data point instead of the nodatavalue?
file one: [-999,-999,-999,-999]
file two: [7,4,3,2]
desired output for pixels after combination: [7,4,3,2]
My code that I've been using produces this image. https://i.sstatic.net/ejyX1.jpg
from osgeo import gdal
from gdalconst import GA_ReadOnly
f1 = r'C:\Users\User\Data\NHDPlusCA\NHDPlus18\NEDSnapshot\Ned18a\elev_cm'
f2 = r'C:\Users\User\Data\NHDPlusCA\NHDPlus18\NEDSnapshot\Ned18b\elev_cm'
xmins = []
ymins = []
xmaxs = []
ymaxs = []
files = [f1, f2]
for f in files:
raster = gdal.Open(f, GA_ReadOnly)
projection = raster.GetProjection()
cols = raster.RasterXSize
rows = raster.RasterYSize
f_xmin, f_pwidth, f_xskew, f_ymax, f_yskew, f_pheight = raster.GetGeoTransform()
xmins.append(f_xmin)
xmaxs.append(f_xmin + (f_pwidth*cols))
ymaxs.append(f_ymax)
ymins.append(f_ymax + (f_pheight*rows))
del raster
#extents
xmax = max(xmaxs)
ymax = max(ymaxs)
xmin = min(xmins)
ymin = min(ymins)
newwidth = round((xmax - (xmin))/f_pwidth)
newheight = round((ymax -(ymin))/abs(f_pheight))
print('xmins', xmins)
print('ymins', ymins)
print('xmaxs', xmaxs)
print('ymaxs', ymaxs)
print('largest x', xmax)
print('smallest x', xmin)
print('largest y', ymax)
print('smallest y', ymin)
print('new raster width', newwidth)
print('new raster height', newheight)
driver = gdal.GetDriverByName('GTiff')
new_ds = driver.Create(
r'C:\Users\User\Desktop\out.tif',
newwidth, # xsize
newheight, # ysize
1, # number of bands
gdal.GDT_Int32, # The datatype of the pixel values
options=[ # Format-specific creation options.
'TILED=YES',
'BIGTIFF=IF_SAFER',
'BLOCKXSIZE=256', # must be a power of 2
'BLOCKYSIZE=256',# also power of 2, need not match BLOCKXSIZE
'COMPRESS=LZW'
])
# fill the new raster with nodata values
outband = new_ds.GetRasterBand(1)
outband.SetNoDataValue(-2147483647)
outband.Fill(-2147483647)
geotransform = [xmin, f_pwidth, 0, ymax, 0, f_pheight]
new_ds.SetGeoTransform(geotransform)
new_ds.SetProjection(projection)
for f in files:
raster = gdal.Open(f, GA_ReadOnly)
f_xmin, f_pwidth, f_xskew, f_ymax, f_yskew, f_pheight = raster.GetGeoTransform() #geotransform for the old files being iterated over
cols = raster.RasterXSize
rows = raster.RasterYSize
xoffset = int((f_xmin - xmin) / f_pwidth)
yoffset = int((f_ymax - ymax) / f_pheight)
band = raster.GetRasterBand(1)
data = band.ReadAsArray(0,0,cols,rows)
outband.WriteArray(data,xoffset,yoffset)
outband.FlushCache()
# When all references to new_ds are unset, the dataset is closed and flushed to disk
new_ds = None
-
1You could try numpy.maximum docs.scipy.org/doc/numpy/reference/generated/numpy.maximum.html because your NoData value is a very small negative number.Michael Stimson– Michael Stimson2017年07月26日 21:37:32 +00:00Commented Jul 26, 2017 at 21:37
-
The problem with this method is that the arrays are not necessarily the same shape. Seems odd there isn't a simple way to compare georeferenced pixels and if one file doesn't have the coordinate pair just take that pixel value.Mitchell Sawtelle– Mitchell Sawtelle2017年07月27日 00:43:08 +00:00Commented Jul 27, 2017 at 0:43
-
You need to find the overlap area and read only that as the overlap area is all you need to apply the operator to. Extents that are not common can be read directly... if necessary create a new raster dataset of the mosaic. or you could use the Mosaic to New Raster tool resources.arcgis.com/en/help/main/10.2/index.html#//… to create the mosaic, which will ignore NoData values, and then read the mosaic.Michael Stimson– Michael Stimson2017年07月27日 00:53:08 +00:00Commented Jul 27, 2017 at 0:53
-
This is a great little scriptShawn– Shawn2018年12月26日 08:13:52 +00:00Commented Dec 26, 2018 at 8:13
1 Answer 1
Because you are limited to writing contiguous blocks of data to GDAL rasters, the best way around this is to create the output array first, then write the array to the output raster. Using this method, you can use numpy's boolean indexing to dictate what is written. Your last for
loop would work as follows using this logic:
import numpy
no_data = -2147483647
output_array = numpy.full((newheight, newwidth), no_data, 'int32')
for f in files:
raster = gdal.Open(f, GA_ReadOnly)
f_xmin, f_pwidth, f_xskew, f_ymax, f_yskew, f_pheight = raster.GetGeoTransform()
cols = raster.RasterXSize
rows = raster.RasterYSize
xoffset = int((f_xmin - xmin) / f_pwidth)
yoffset = int((f_ymax - ymax) / f_pheight) # Assumes grids align
band = raster.GetRasterBand(1)
data = band.ReadAsArray()
# Create a boolean array that allows data where they exist
mask = (output_array == no_data) & (data != band.GetNoDataValue()) # Note: this assumes you prefer data in order based on the files list
# Allocate data to output array
output_array[yoffset:yoffset + rows, xoffset:xoffset + cols][mask] = data[mask]
# Write data to the output raster
outband.WriteArray(output_array)
outband.FlushCache()
However, you will need to fully load the raster in to memory prior to flushing it to the disk. If memory-management is an issue, you can either use numpy.memmap, or read the overlapping block from the output raster for each iteration, which would look like this:
for f in files:
raster = gdal.Open(f, GA_ReadOnly)
f_xmin, f_pwidth, f_xskew, f_ymax, f_yskew, f_pheight = raster.GetGeoTransform() #geotransform for the old files being iterated over
cols = raster.RasterXSize
rows = raster.RasterYSize
xoffset = int((f_xmin - xmin) / f_pwidth)
yoffset = int((f_ymax - ymax) / f_pheight)
band = raster.GetRasterBand(1)
data = band.ReadAsArray(0, 0, cols, rows)
data_update = outband.ReadAsArray(xoffset, yoffset, cols, rows)
mask = (data_update == no_data) & (data != band.GetNoDataValue())
data_update[mask] = data[mask]
outband.WriteArray(data_update, xoffset, yoffset)
outband.FlushCache()
One last general note of caution: your method assumes that raster grids and data types align.