0

I'm trying to add up several rasters to get total number of 'crop hits' over time. I have to do this process through several iterations because once script processes one tile, but each tile is made up of 4 quarter tiles, and there are 5 iterations (5 kernels) of each quarter tiles. Then on top of that, there are two different time periods that will be outputs (epoch1 = 2007-2010 and epoch2 = 2011-2014).

For each iteration (aka kernel) of every quarter tile, there is one raster per year. Each of these rasters has three values: 0 for no crop, 200 for no data (also no crop), and yr_value (an integer representing the two digit year), which indicates crop. My eventual goal is to get a per-pixel proportion raster, where:

proportion = # of crop hits/# of total hits

...but for now I'm having trouble getting the two individual pieces (# of crop hits which I'm calling "crop" and # of total hits which I'm calling "data")

Here is the relevant code:

import os
import math
import glob
import gdal
import numpy as np
import sys
tile = str(sys.argv[1])
gdal.UseExceptions() # enable exceptions to report errors
drvtif = gdal.GetDriverByName("GTiff")

.... define functions, set up variables/directories, get quarterTile and kernel lists....

for quarterTile in quarterTiles[0:1]: # TEST with the first quarterTile in list
 for kernel in kernels[0:1]: # TEST with the first kernel
 input_dir = os.path.join(indir, kernel) # where the individual year rasters live
 for e in range(0,2): # epoch 1 or 2
 epoch = str(e+1) # because 0 based index
 epoch_years = epoch_years_list[e] # epoch_years_list is a list of 2 lists with the years in each epoch. epoch1=2007-2010 and epoch2=2011-2014
 yr_cnt = 0 # to count which year we are on (only adds if raster for the year exists)
 for yr in epoch_years: # loop thru the years for the time period (epoch) we are on
 yr_value = int(yr[2:4]) # the value of yes Crop is = the 2 digit year
 # look for the matching raster:
 yr_raster = os.path.join(input_dir, '%s_%s_%s-max.tif' % (quarterTile, yr, kernel)) # this is what the year raster is named, if it exists
 if not os.path.isfile(yr_raster): # if the year doesn't exist
 continue # non-existent data for a year represents a bunch of 0's to be added to both crop hits and data hits, so we can just skip it
 # if it does exist...
 yr_cnt += 1 # add to year count so we know if we are on the first year or not 
 (yr_arr, dt, gt, proj, ncols, nrows) = read_tif_asArray(yr_raster) # read_tif_asArray is self-defined function that uses gdal API to read a tif into a numpy array
 print np.unique(yr_arr) # 1!!! this should yield 0, 200, and whatever yr_value is, e.g. for 2007 yr_value would be 7
 # copy the yr_arr to crop and data arrays, then mask yes/no 
 crop_arr = yr_arr # the crop array for the particular year
 data_arr = yr_arr
 yr_arr = None
 del yr_arr # no longer need this
 # mask out crop array. In this case, 200 and 0 are not crop, yr_value is crop
 # 200 --> 0, yr_val --> 1, 0 stays 0
 crop_arr[crop_arr == 200] = 0 # no data vals turn into 0 (not crop)
 crop_arr[crop_arr == yr_value] = 1 # crop vals (year val) turn into 1 (crop)
 print np.unique(crop_arr) # 2!!! should be [0 1]
 # mask out data array: 
 # in this case, 0's and yr_value count as data hits (1)
 # 200 is the only value that represents no data (0)
 # method 1:
 ##data_arr[data_arr < 200] = 1 # if yr array is not 200 (not NoData), then there is data whether it be crop (yr_val) or no crop (0)
 ##data_arr[data_arr == 200] = 0 # if yr array is 200 (NoData), then there is no data
 # also tried doing the previous block this way:
 data_arr[data_arr == yr_value] = 1 # if yr array is yr val, it is data
 data_arr[data_arr == 0] = 1 # if array is 0, it is data, just not crop 
 data_arr[data_arr == 200] = 0 # if yr array is 200 (NoData), then there is no data 
 print np.unique(data_arr) # 3!!! should be [0 1]
 if yr_cnt == 1: # if we are in the first available year, we need to set the final sum arrays = crop and data array for the year
 crop_sum_arr = crop_arr
 data_sum_arr = data_arr
 else: # if it's not the first available year, add to the crop and data sum arrays
 crop_sum_arr += crop_arr
 data_sum_arr += data_arr 
 crop_arr = None
 data_arr = None
 del crop_arr, data_arr # can erase both arrays from memory after each year
 print np.unique(crop_sum_arr) # 4!!! should be in range 0-4 (4 years max in one epoch)
 print np.unique(data_sum_arr) # 5!!! should also be in range 0-4 
 # write the two sum arrays as epoch1 output of quarterTile, kernel
 outdir = '/att/gpfsfs/userfs01/ppl/mwooten3/IDS/CropMaps_redo/test_segEpoch/testout/' # TEST output directory
 write_array_asTif(crop_sum_arr, outdir, 'cropsum-%s-%s-epoch%s' % (quarterTile, kernel, epoch), gt, proj, ncols, nrows, dt, 200)
 write_array_asTif(data_sum_arr, outdir, 'datasum-%s-%s-epoch%s' % (quarterTile, kernel, epoch), gt, proj, ncols, nrows, dt, 200)
 # then erase them
 crop_sum_arr = None
 data_sum_arr = None
 del crop_sum_arr, data_sum_arr

I've commented out the code with notes. Please note that the comments were made for the code in general, but I am running, right now, the code with a quarter tile/kernel/epoch that only has two years of data available and that is what I address below. The areas that I've numbered like 1!!! or 2!!! are where the problems become obvious. Below is what the code prints out for the areas I've numbered:

-For 1, the unique values of the year array are correct. [0 8 200] for 2008, for example

-For 2 and 3: the unique values of the year arrays after they've been copied to crop array and data array and masked like described at the beginning, are right for 2 and wrong for 3. For both years in the test quarterTile/kernel/epoch, unique values of the crop array (2) were correct: [0 1], but the unique values of the data array (3) were wrong: [1]. They should have been [0, 1] as I know for a fact there are areas with NoData which should be, after masking, a value of 0.

-For 4 and 5, the unique values of the crop and data sum arrays are wrong. According to the output only value represented throughout the entire array for both arrays is [3], which makes no sense because a) there were only 2 years (so a maximum sum of 2) and b) I know for a fact there are pixels where the raster had no crop for both years (which should give us a value of 0) AND there are pixels where one year had crop, but the other didn't, which should give me a value of 1... and same with the data sum raster.

These numpy unique values are further confirmed by the rasters that I wrote from the arrays. In both the final crop and data sum rasters, pixels that are supposed to be 0, 1, or 2 are all 3, which shouldn't even be a value at all.

Can someone help me figure out what's wrong with my logic here?

To expand a little further, the code then moves on to the second epoch. The output values for both crop and data sum arrays for this second epoch, which had 4 years of data available, was just [7] when it should have been [0 1 2 3 4]. So something is wrong with my logic or the way my loops are set up.

Maybe I'm failing to clear out the arrays and everything is adding up weirdly? I have no idea.

PolyGeo
65.5k29 gold badges115 silver badges350 bronze badges
asked Sep 14, 2016 at 16:43

1 Answer 1

1

The problem was with the way I was copying the year array to crop and data arrays:

crop_arr = yr_arr # the crop array for the particular year
data_arr = yr_arr

needs to be:

crop_arr = np.copy(yr_arr) # the crop array for the particular year
data_arr = np.copy(yr_arr)
answered Sep 19, 2016 at 13:53

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.