adding rasters with gdal/numpy---incorrect logic or bug?

Question 1

I'm trying to add up several rasters to get total number of 'crop hits' over time. I have to do this process through several iterations because once script processes one tile, but each tile is made up of 4 quarter tiles, and there are 5 iterations (5 kernels) of each quarter tiles. Then on top of that, there are two different time periods that will be outputs (epoch1 = 2007-2010 and epoch2 = 2011-2014).

For each iteration (aka kernel) of every quarter tile, there is one raster per year. Each of these rasters has three values: 0 for no crop, 200 for no data (also no crop), and yr_value (an integer representing the two digit year), which indicates crop. My eventual goal is to get a per-pixel proportion raster, where:

proportion = # of crop hits/# of total hits

...but for now I'm having trouble getting the two individual pieces (# of crop hits which I'm calling "crop" and # of total hits which I'm calling "data")

Here is the relevant code:

import os
import math
import glob
import gdal
import numpy as np
import sys
tile = str(sys.argv[1])
gdal.UseExceptions() # enable exceptions to report errors
drvtif = gdal.GetDriverByName("GTiff")

.... define functions, set up variables/directories, get quarterTile and kernel lists....

for quarterTile in quarterTiles[0:1]: # TEST with the first quarterTile in list
 for kernel in kernels[0:1]: # TEST with the first kernel
 input_dir = os.path.join(indir, kernel) # where the individual year rasters live
 for e in range(0,2): # epoch 1 or 2
 epoch = str(e+1) # because 0 based index
 epoch_years = epoch_years_list[e] # epoch_years_list is a list of 2 lists with the years in each epoch. epoch1=2007-2010 and epoch2=2011-2014
 yr_cnt = 0 # to count which year we are on (only adds if raster for the year exists)
 for yr in epoch_years: # loop thru the years for the time period (epoch) we are on
 yr_value = int(yr[2:4]) # the value of yes Crop is = the 2 digit year
 # look for the matching raster:
 yr_raster = os.path.join(input_dir, '%s_%s_%s-max.tif' % (quarterTile, yr, kernel)) # this is what the year raster is named, if it exists
 if not os.path.isfile(yr_raster): # if the year doesn't exist
 continue # non-existent data for a year represents a bunch of 0's to be added to both crop hits and data hits, so we can just skip it
 # if it does exist...
 yr_cnt += 1 # add to year count so we know if we are on the first year or not 
 (yr_arr, dt, gt, proj, ncols, nrows) = read_tif_asArray(yr_raster) # read_tif_asArray is self-defined function that uses gdal API to read a tif into a numpy array
 print np.unique(yr_arr) # 1!!! this should yield 0, 200, and whatever yr_value is, e.g. for 2007 yr_value would be 7
 # copy the yr_arr to crop and data arrays, then mask yes/no 
 crop_arr = yr_arr # the crop array for the particular year
 data_arr = yr_arr
 yr_arr = None
 del yr_arr # no longer need this
 # mask out crop array. In this case, 200 and 0 are not crop, yr_value is crop
 # 200 --> 0, yr_val --> 1, 0 stays 0
 crop_arr[crop_arr == 200] = 0 # no data vals turn into 0 (not crop)
 crop_arr[crop_arr == yr_value] = 1 # crop vals (year val) turn into 1 (crop)
 print np.unique(crop_arr) # 2!!! should be [0 1]
 # mask out data array: 
 # in this case, 0's and yr_value count as data hits (1)
 # 200 is the only value that represents no data (0)
 # method 1:
 ##data_arr[data_arr < 200] = 1 # if yr array is not 200 (not NoData), then there is data whether it be crop (yr_val) or no crop (0)
 ##data_arr[data_arr == 200] = 0 # if yr array is 200 (NoData), then there is no data
 # also tried doing the previous block this way:
 data_arr[data_arr == yr_value] = 1 # if yr array is yr val, it is data
 data_arr[data_arr == 0] = 1 # if array is 0, it is data, just not crop 
 data_arr[data_arr == 200] = 0 # if yr array is 200 (NoData), then there is no data 
 print np.unique(data_arr) # 3!!! should be [0 1]
 if yr_cnt == 1: # if we are in the first available year, we need to set the final sum arrays = crop and data array for the year
 crop_sum_arr = crop_arr
 data_sum_arr = data_arr
 else: # if it's not the first available year, add to the crop and data sum arrays
 crop_sum_arr += crop_arr
 data_sum_arr += data_arr 
 crop_arr = None
 data_arr = None
 del crop_arr, data_arr # can erase both arrays from memory after each year
 print np.unique(crop_sum_arr) # 4!!! should be in range 0-4 (4 years max in one epoch)
 print np.unique(data_sum_arr) # 5!!! should also be in range 0-4 
 # write the two sum arrays as epoch1 output of quarterTile, kernel
 outdir = '/att/gpfsfs/userfs01/ppl/mwooten3/IDS/CropMaps_redo/test_segEpoch/testout/' # TEST output directory
 write_array_asTif(crop_sum_arr, outdir, 'cropsum-%s-%s-epoch%s' % (quarterTile, kernel, epoch), gt, proj, ncols, nrows, dt, 200)
 write_array_asTif(data_sum_arr, outdir, 'datasum-%s-%s-epoch%s' % (quarterTile, kernel, epoch), gt, proj, ncols, nrows, dt, 200)
 # then erase them
 crop_sum_arr = None
 data_sum_arr = None
 del crop_sum_arr, data_sum_arr

I've commented out the code with notes. Please note that the comments were made for the code in general, but I am running, right now, the code with a quarter tile/kernel/epoch that only has two years of data available and that is what I address below. The areas that I've numbered like 1!!! or 2!!! are where the problems become obvious. Below is what the code prints out for the areas I've numbered:

-For 1, the unique values of the year array are correct. [0 8 200] for 2008, for example

-For 2 and 3: the unique values of the year arrays after they've been copied to crop array and data array and masked like described at the beginning, are right for 2 and wrong for 3. For both years in the test quarterTile/kernel/epoch, unique values of the crop array (2) were correct: [0 1], but the unique values of the data array (3) were wrong: [1]. They should have been [0, 1] as I know for a fact there are areas with NoData which should be, after masking, a value of 0.

-For 4 and 5, the unique values of the crop and data sum arrays are wrong. According to the output only value represented throughout the entire array for both arrays is [3], which makes no sense because a) there were only 2 years (so a maximum sum of 2) and b) I know for a fact there are pixels where the raster had no crop for both years (which should give us a value of 0) AND there are pixels where one year had crop, but the other didn't, which should give me a value of 1... and same with the data sum raster.

These numpy unique values are further confirmed by the rasters that I wrote from the arrays. In both the final crop and data sum rasters, pixels that are supposed to be 0, 1, or 2 are all 3, which shouldn't even be a value at all.

Can someone help me figure out what's wrong with my logic here?

To expand a little further, the code then moves on to the second epoch. The output values for both crop and data sum arrays for this second epoch, which had 4 years of data available, was just [7] when it should have been [0 1 2 3 4]. So something is wrong with my logic or the way my loops are set up.

Maybe I'm failing to clear out the arrays and everything is adding up weirdly? I have no idea.

Question 2

The problem was with the way I was copying the year array to crop and data arrays:

crop_arr = yr_arr # the crop array for the particular year
data_arr = yr_arr

needs to be:

crop_arr = np.copy(yr_arr) # the crop array for the particular year
data_arr = np.copy(yr_arr)

user20408 user20408 1,5551 gold badge23 silver badges43 bronze badges · Accepted Answer · 2016-09-19 13:53:41Z

The problem was with the way I was copying the year array to crop and data arrays:

crop_arr = yr_arr # the crop array for the particular year
data_arr = yr_arr

needs to be:

crop_arr = np.copy(yr_arr) # the crop array for the particular year
data_arr = np.copy(yr_arr)

Stack Exchange Network

adding rasters with gdal/numpy---incorrect logic or bug?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

adding rasters with gdal/numpy---incorrect logic or bug?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions