If you have an array with x, y and z dimensions, how can I create an array of x, y dimensions that holds a count of how many z values at each location (x,y) are noData?
I would prefer to use numpy and python for this task, but could also use arcpy (I am working with raster data) or R.
2 Answers 2
Assuming that you allready read your raster data into a numpy array called raster_stack
and stacked all rasters along the z-axis.
import numpy as np
no_data = -32768
np.sum(raster_stack == no_data, axis=0)
This will result in a 2-dimensional array containing the count of no_data
values for each x,y location. It will also be as fast as you can get with Python since all the looping is handled by numpy
in fast C functions, instead of slow Python loops.
How does it work:
raster_stack == no_data
creates abool
array with the same dimensions as yourraster_stack
which containsTrue
for allno_data
observations.In numpy you can treat True/False like 1/0, meaning that
True + True
will equal2
.np.sum
sums up all True observations along the z-axis (axis=0
) and returns the result as a flattened 2D-array.
To support my performance claim, let's compare this with the numpy method in @JamesSLC answer.
# create a test dataset, where the first and last 2 rasters contain no_data values
raster_stack = np.arange(200000).reshape(20, 100, 100)
raster_stack[raster_stack < 20000] = -32768
raster_stack[raster_stack > 180000] = -32768
# sum of no_data values without loops
def no_data(array, no_data):
return np.sum(array == no_data, axis=0)
# sum of no_data values with masked_array and loops
def masked_no_data(array, no_data):
out_raster_data = np.zeros((array.shape[1], array.shape[2]), np.int)
bands = range(array.shape[0])
bands_list = []
for i in bands:
temp_array = np.array(raster_stack[i,:,:])
masked_array = np.ma.masked_values(temp_array, no_data)
bands_list.append(masked_array.mask)
for i in xrange(0, len(bands)):
out_raster_data += bands_list[i]
return out_raster_data
.
# compare that both functions produce the same result
np.array_equal(no_data(raster_stack, -32768), masked_no_data(raster_stack, -32768))
>>True
.
%timeit no_data(raster_stack, -32768)
>>1000 loops, best of 3: 324 μs per loop
.
%timeit masked_no_data(raster_stack, -32768)
>>1000 loops, best of 3: 998 μs per loop
Using numpys internal functions without the masked array is roughly 3 times faster. It should be noted however that the vast majority of executing the real task will be spent reading the data into numpy arrays in the first place.
-
How would this compare/is it faster than the numpy native functions method I proposed above?GeoSharp– GeoSharp2015年10月08日 11:27:30 +00:00Commented Oct 8, 2015 at 11:27
-
1@JamesSLC see my edit. The proposed function is shorter and faster. I originally wanted to post this as a comment on your answer but onfortunately it was too long for that. Just my 2 cents on how to improve the numpy part of your answer.Kersten– Kersten2015年10月08日 12:54:27 +00:00Commented Oct 8, 2015 at 12:54
Update - Added additional method to compare with.
Using a mixture of arcpy
and numpy
the methods below should do the trick:
import arcpy, numpy
##Get Rasters bands as well as raster height, width, and no data value
in_path = r'' #enter the fullpath to your raster stack here
arcpy.env.workspace = in_path
bands = arcpy.ListRasters()
in_ras = arcpy.Raster(in_path)
raster_rows = in_ras.width
raster_columns = in_ras.height
no_data = in_ras.noDataValue # this property can be tricky so you may need to set the value manually
#create an empty numpy array of zeros the same size as input raster to store the results
out_raster_data = numpy.zeros((raster_columns, raster_rows), numpy.int)
From here I will present two different ways you can go.
The first method (shown below) relies on native numpy functions. It uses native methods from numpy to mask data based on the provided no data value. This method is much faster:
bands_list = []
for i in bands:
temp_array = arcpy.RasterToNumPyArray(i).astype(numpy.float32)
masked_array = numpy.ma.masked_values(temp_array, no_data)
bands_list.append(masked_array.mask) #use the mask from masked_array that is an array of True/False values depending on if the value in the array was no_data or not
for i in xrange(0, len(bands)):
out_raster_data += bands_list[i] #perform addition on the Boolean mask arrays (1 for False (no_data), 0 for True (data))
The second method is a looping method which loops over the pixels/array elements individually. This method is much slower.
#Create a list that houses each raster array
bands_list = []
for i in bands:
bands_list.append(arcpy.RasterToNumPyArray(i).astype(numpy.float32))
#loop over all columns, rows, then bands
#loop over bands last to get a value from each band and increase the output pixel value if it is equal to no_data
for y in xrange(0, raster_columns):
for x in xrange(0, raster_rows):
out_pixel_data = 0
for j in range(0,len(bands)):
sample_value = bands_list[j][y,x]
if sample_value == no_data:
out_pixel_data += 1
out_raster_data[y,x] = out_pixel_data
To save the output in raster format for both methods, you can finish with this:
#convert numpy array to raster and save the results
result_raster = arcpy.NumPyArrayToRaster(out_raster_data)
result_raster.save(r'') #enter output raster path here
If you are working with a spatial reference you will need to project the resulting raster either when converting to raster or after the fact.
numpy.ndarray
or do you first need to read it from disk? How is your data structured?