Total count of values at each pixel in a raster stack/ 3d array

Question 1

If you have an array with x, y and z dimensions, how can I create an array of x, y dimensions that holds a count of how many z values at each location (x,y) are noData?

I would prefer to use numpy and python for this task, but could also use arcpy (I am working with raster data) or R.

Question 2

Was that code helpful for you?

Question 3

@JamesSLC, , thank you both of your answers look very helpful. I will do some testing over the next few days and respond as to which one was preferable for my work. I would prefer to use solutions without the use of arcpy, as I don't have a license on my home computer, but your solution will work in the office!

Question 4

Is the array already a numpy.ndarray or do you first need to read it from disk? How is your data structured?

Question 5

Accepting the answer is not the only way to show appreciation for someone's time ;-) see site help center here.

Question 6

@Kersten I the raster stack is already a numpy.ndarray. I am also working on incorporating cloud and data quality masked, so may use the masked array format

Question 7

Assuming that you allready read your raster data into a numpy array called raster_stack and stacked all rasters along the z-axis.

import numpy as np
no_data = -32768
np.sum(raster_stack == no_data, axis=0)

This will result in a 2-dimensional array containing the count of no_data values for each x,y location. It will also be as fast as you can get with Python since all the looping is handled by numpy in fast C functions, instead of slow Python loops.

How does it work:

raster_stack == no_data creates a bool array with the same dimensions as your raster_stack which contains True for all no_data observations.
In numpy you can treat True/False like 1/0, meaning that True + True will equal 2.
np.sum sums up all True observations along the z-axis (axis=0) and returns the result as a flattened 2D-array.

To support my performance claim, let's compare this with the numpy method in @JamesSLC answer.

# create a test dataset, where the first and last 2 rasters contain no_data values
raster_stack = np.arange(200000).reshape(20, 100, 100)
raster_stack[raster_stack < 20000] = -32768
raster_stack[raster_stack > 180000] = -32768
# sum of no_data values without loops
def no_data(array, no_data):
 return np.sum(array == no_data, axis=0)
# sum of no_data values with masked_array and loops
def masked_no_data(array, no_data):
 out_raster_data = np.zeros((array.shape[1], array.shape[2]), np.int) 
 bands = range(array.shape[0])
 bands_list = []
 for i in bands:
 temp_array = np.array(raster_stack[i,:,:])
 masked_array = np.ma.masked_values(temp_array, no_data)
 bands_list.append(masked_array.mask)
 for i in xrange(0, len(bands)):
 out_raster_data += bands_list[i]
 return out_raster_data

.

# compare that both functions produce the same result
np.array_equal(no_data(raster_stack, -32768), masked_no_data(raster_stack, -32768))
>>True

.

%timeit no_data(raster_stack, -32768)
>>1000 loops, best of 3: 324 μs per loop

.

%timeit masked_no_data(raster_stack, -32768)
>>1000 loops, best of 3: 998 μs per loop

Using numpys internal functions without the masked array is roughly 3 times faster. It should be noted however that the vast majority of executing the real task will be spent reading the data into numpy arrays in the first place.

Question 8

How would this compare/is it faster than the numpy native functions method I proposed above?

Question 9

@JamesSLC see my edit. The proposed function is shorter and faster. I originally wanted to post this as a comment on your answer but onfortunately it was too long for that. Just my 2 cents on how to improve the numpy part of your answer.

Question 10

Update - Added additional method to compare with.

Using a mixture of arcpy and numpy the methods below should do the trick:

import arcpy, numpy
##Get Rasters bands as well as raster height, width, and no data value
in_path = r'' #enter the fullpath to your raster stack here
arcpy.env.workspace = in_path
bands = arcpy.ListRasters()
in_ras = arcpy.Raster(in_path)
raster_rows = in_ras.width
raster_columns = in_ras.height
no_data = in_ras.noDataValue # this property can be tricky so you may need to set the value manually
#create an empty numpy array of zeros the same size as input raster to store the results
out_raster_data = numpy.zeros((raster_columns, raster_rows), numpy.int)

From here I will present two different ways you can go.

The first method (shown below) relies on native numpy functions. It uses native methods from numpy to mask data based on the provided no data value. This method is much faster:

bands_list = []
for i in bands:
 temp_array = arcpy.RasterToNumPyArray(i).astype(numpy.float32)
 masked_array = numpy.ma.masked_values(temp_array, no_data)
 bands_list.append(masked_array.mask) #use the mask from masked_array that is an array of True/False values depending on if the value in the array was no_data or not
for i in xrange(0, len(bands)):
 out_raster_data += bands_list[i] #perform addition on the Boolean mask arrays (1 for False (no_data), 0 for True (data))

The second method is a looping method which loops over the pixels/array elements individually. This method is much slower.

#Create a list that houses each raster array
bands_list = []
for i in bands:
 bands_list.append(arcpy.RasterToNumPyArray(i).astype(numpy.float32))
#loop over all columns, rows, then bands
#loop over bands last to get a value from each band and increase the output pixel value if it is equal to no_data
for y in xrange(0, raster_columns):
 for x in xrange(0, raster_rows):
 out_pixel_data = 0
 for j in range(0,len(bands)):
 sample_value = bands_list[j][y,x]
 if sample_value == no_data:
 out_pixel_data += 1
 out_raster_data[y,x] = out_pixel_data

To save the output in raster format for both methods, you can finish with this:

#convert numpy array to raster and save the results
result_raster = arcpy.NumPyArrayToRaster(out_raster_data)
result_raster.save(r'') #enter output raster path here

If you are working with a spatial reference you will need to project the resulting raster either when converting to raster or after the fact.

Kersten Kersten 10k3 gold badges39 silver badges59 bronze badges · Accepted Answer · 2015-10-08 10:57:23Z

Assuming that you allready read your raster data into a numpy array called raster_stack and stacked all rasters along the z-axis.

import numpy as np
no_data = -32768
np.sum(raster_stack == no_data, axis=0)

This will result in a 2-dimensional array containing the count of no_data values for each x,y location. It will also be as fast as you can get with Python since all the looping is handled by numpy in fast C functions, instead of slow Python loops.

How does it work:

raster_stack == no_data creates a bool array with the same dimensions as your raster_stack which contains True for all no_data observations.
In numpy you can treat True/False like 1/0, meaning that True + True will equal 2.
np.sum sums up all True observations along the z-axis (axis=0) and returns the result as a flattened 2D-array.

To support my performance claim, let's compare this with the numpy method in @JamesSLC answer.

# create a test dataset, where the first and last 2 rasters contain no_data values
raster_stack = np.arange(200000).reshape(20, 100, 100)
raster_stack[raster_stack < 20000] = -32768
raster_stack[raster_stack > 180000] = -32768
# sum of no_data values without loops
def no_data(array, no_data):
 return np.sum(array == no_data, axis=0)
# sum of no_data values with masked_array and loops
def masked_no_data(array, no_data):
 out_raster_data = np.zeros((array.shape[1], array.shape[2]), np.int) 
 bands = range(array.shape[0])
 bands_list = []
 for i in bands:
 temp_array = np.array(raster_stack[i,:,:])
 masked_array = np.ma.masked_values(temp_array, no_data)
 bands_list.append(masked_array.mask)
 for i in xrange(0, len(bands)):
 out_raster_data += bands_list[i]
 return out_raster_data

.

# compare that both functions produce the same result
np.array_equal(no_data(raster_stack, -32768), masked_no_data(raster_stack, -32768))
>>True

.

%timeit no_data(raster_stack, -32768)
>>1000 loops, best of 3: 324 μs per loop

.

%timeit masked_no_data(raster_stack, -32768)
>>1000 loops, best of 3: 998 μs per loop

Using numpys internal functions without the masked array is roughly 3 times faster. It should be noted however that the vast majority of executing the real task will be spent reading the data into numpy arrays in the first place.

How would this compare/is it faster than the numpy native functions method I proposed above?
@JamesSLC see my edit. The proposed function is shorter and faster. I originally wanted to post this as a comment on your answer but onfortunately it was too long for that. Just my 2 cents on how to improve the numpy part of your answer.

Stack Exchange Network

Total count of values at each pixel in a raster stack/ 3d array

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Total count of values at each pixel in a raster stack/ 3d array

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions