3

I have a .tif that I read into an array (call it tifArray), and I would like to classify the array based on set of conditions:

  • Where 1200 <= tifArray <= 4000, outputArray = 1
  • Where tifArray < 1200, outputArray = 2
  • Where tifArrary> 4000, outputArray = 3

I've tried both creating a new array (preferred) and replacing the values in place (see below) but I regardless I get a MemoryError. I've tried on machines that have 10 and 8 GB of free RAM. I've also tried using the np.where function and just plain indexing (below). I have no clue why I'm getting a memory error, and I don't think I should be running into that problem.

Some information about tifArray:

tifArray.shape = (55500, 55500)
tifArray.dtype = uint16

And here is one method I tried:

threshold_low = 1200
threshold_high = 4000
tifArray[(tifArray >= threshold_low) & (tifArray <= threshold_high)] = 1
tifArray[(tifArray < threshold_low)] = 2 
tifArray[(tifArray > threshold_high)] = 3 

The error:

File "./classify_segmented_fromAmplitude.py", line 90, in process_tile
 tifArray[(tifArray >= threshold_low) & (tifArray <= threshold_high)] = 1
MemoryError

When I comment out the first condition:

tifArray[(tifArray >= threshold_low) & (tifArray <= threshold_high)] = 1

and just run the following two lines, I get no Memory Error. So obviously the problem is with the multiple conditions, but I don't know how to get around it. Any ideas?

PolyGeo
65.5k29 gold badges115 silver badges350 bronze badges
asked Apr 26, 2016 at 18:28
1
  • try setting the first two classes separately first abovelow = tifArray >= threshold_low, belowhigh =tifArray <= threshold_high) then >> tifArray[belowhigh & abovelow] = 1 Commented Apr 26, 2016 at 20:05

1 Answer 1

1

This comes down to efficient chaining of boolean operations. Each comparison (tifArray > threshold) yields a temporary boolean array with the same dimensions as tifArray, which also consumes memory (since bool is stored as int8 it needs half the memory of a uint16 array).

An array of type uint16 and size=(55500, 55500) takes up ~6 Gb of memory. So on a 10 Gb machine you can get away with one comparison at a time (6 Gb tifArray + 3 Gb intermediate bool array).

The comparison (tifArray >= threshold_low) & (tifArray <= threshold_high) yields three temporary arrays, which are 1.5 times the size of tifArray - more than youre machines can handle.

In addition this also creates a lot of 1 entries before you compare the lower threshold and effectively set all of them to 2. Even if your code would run the result would be wrong.

One way to solve all this is first test the lower, then the upper threshold and then set everything above 3 to 1 - since the only remaining values that are not 2 or 3 are between the lower and upper threshold.

thrs_low = 1200
thrs_high = 4000
tifArray[tifArray < thrs_low] = 2
tifArray[tifArray > thrs_high] = 3
tifArray[tifArray > 3] = 1
answered Apr 27, 2016 at 8:51
1
  • I ended up switching to a machine with 20GB of RAM, and that did the trick. But this is also a good work around, so I'll accept it. Thanks! Commented Apr 27, 2016 at 17:55

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.