Vectorizing a pixel-averaging operation in Numpy

Question 1

I am reading from a file containing some segments (irregular parcels of the image) and trying to average the entire segment to have one pixel value. This is the code I use:

band = band[:,:,0] #take the first band of the image 
for i in range(numSegments): #for every segment
 tx = band[segments==i] #select all the pixels in segment
 avg = np.average(tx) #average the values
 band[segments==i] = avg #write the average back into the image

I am omiting some transformation steps and code for printing running time from the snippet.

This takes quite sometime to run for even one band. Almost 1000 seconds. I was wondering if there is a way to vectorize this operation to make it faster?

Data:
Segment2009: an image of all the segments in the image.
This is what the segments look like:

enter image description here

Bands: 3000x3000 pixels, float32 values.

Full context:

workFolder = '/home/shaunak/Work/ChangeDet_2016/SLIC/003_lee_m0_alpha'
bandlist=os.path.join(workFolder,'bandlist.txt')
configfile = os.path.join(workFolder,'config.txt')
segmentfile = os.path.join(workFolder,'Segments2009')
#%% Load the bands -- can refer to subfolders in the bandlist
files = utilities.readBandList(bandlist)
destinations = []
for f in files:
 destinations.append(f.split('.')[0]+"_SP."+f.split('.')[1])
(lines,samples,bands) = utilities.readConfigImSizeBand(configfile)
#%% Superpixel file
segments = np.fromfile(segmentfile,dtype='float32')
segments = np.reshape(segments,(lines,samples))
numSegments = int(np.max(segments))
#%% simple avg
for idx,f in enumerate(files):
 band = np.fromfile(f,dtype='float32').reshape((lines,samples))
 start = time.time()
 for i in range(numSegments):
 tx = band[segments==i]
 avg = np.average(tx)
 band[segments==i] = avg
 band.tofile(destinations[idx])

I am writing the values back to the original after averaging. It is not necessary, also not the most expensive part -- and helps me visualize the results better so I kept it in. I used the following approach also:

avgOut = np.zeros((numSegments,bands),dtype='float32')
#avgOutJoined = np.zeros((lines,samples,bands),dtype='float32')
for i in range(numSegments):
 tx = band[segments==i]
 avgOut[i,:] = np.average(tx,axis=0)
# avgOutJoined[segments==i,:] = np.average(tx,axis=0)
np.tofile(outputSeperated,avgOut) 
#np.tofile(outputJoined,avgOutJoined)

Since not writing the averaged result back did not save much time, I kept it in.

Question 2

You migth be able to get what you are looking for by replacing the whole snippet by np.apply_along_axis(np.average, 3, band). But is it trully your whole code? I mean, there is lack of context here. How can you be sure this is your bottleneck?

Question 3

@MathiasEttinger This is the whole code. The only other line I omitted was ` tx = tx + 8.7` (a constant). I will look into the apply_along_axis -- but I need to filter by that section map (segments) -- and I don't think this is the solution -- will check it out anyway.

Question 4

@MathiasEttinger - No doesn't work :( -- I need to basically mask the array before averaging.

Question 5

Anyway, it would help to get a more accurate description of the data. Like the code that read the file and an excerpt of said file.

Question 6

apply_along_axis under the covers, uses Python iteration. It just make iteration over multiple axes (all but the chosen one) easier.

Question 7

The function scipy.ndimage.measurements.mean takes a labelled array and computes the mean of the values at each label. So instead of:

for i in range(numSegments):
 tx = band[segments==i]
 avg = np.average(tx)
 band[segments==i] = avg

you can write:

segment_mean = scipy.ndimage.measurements.mean(band, segments, range(numSegments))
band = segment_mean[segments]

Though I should add that the last operation here (the reconstruction of band) seems quite wasteful to me: all the information you need is already in the array segments and the array segment_mean (which has just one value per segment). Why do you then need to reconstruct the full array, filling each segment with its mean value? Could you not refactor the subsequent processing to use segments and segment_mean directly?

Update: you clarified the question to explain that writing the mean values back to band was just for visualization and is not an essential part of your application. In that case, you just need the one call to scipy.ndimage.measurements.mean.

Question 8

Yes. I added my full code to also show what else I tried. Even doing it without the reconstruction was quite time consuming. Will give what you wrote a try.

Question 9

Down from hundreds of seconds to a couple, thanks! i would love to explore how this function works to speed this up.

Question 10

@shaunakde: Works for me, but obviously I have a different segments array. What is the error?

Question 11

Nevermind. It was my stupidity. Thanks a lot for everything!

Question 12

Try replacing

 band[segments==i] = avg

With

tx[:] = avg

Eliminating a redundant masking operation.

Does the shape of tx differ with i? In other words tell us something about segments==i. Can you precalculate it for all i, so you don't have to repeat it for each band1 file? Maybe applywhere` so it's indexes rather than boolean mask?

Question 13

Each segment has a different number of pixels, so this solution will not work

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2016-10-28 10:10:33Z

The function scipy.ndimage.measurements.mean takes a labelled array and computes the mean of the values at each label. So instead of:

for i in range(numSegments):
 tx = band[segments==i]
 avg = np.average(tx)
 band[segments==i] = avg

you can write:

segment_mean = scipy.ndimage.measurements.mean(band, segments, range(numSegments))
band = segment_mean[segments]

Though I should add that the last operation here (the reconstruction of band) seems quite wasteful to me: all the information you need is already in the array segments and the array segment_mean (which has just one value per segment). Why do you then need to reconstruct the full array, filling each segment with its mean value? Could you not refactor the subsequent processing to use segments and segment_mean directly?

Update: you clarified the question to explain that writing the mean values back to band was just for visualization and is not an essential part of your application. In that case, you just need the one call to scipy.ndimage.measurements.mean.

Yes. I added my full code to also show what else I tried. Even doing it without the reconstruction was quite time consuming. Will give what you wrote a try.
Down from hundreds of seconds to a couple, thanks! i would love to explore how this function works to speed this up.
@shaunakde: Works for me, but obviously I have a different segments array. What is the error?
Nevermind. It was my stupidity. Thanks a lot for everything!

Stack Exchange Network

Vectorizing a pixel-averaging operation in Numpy

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Vectorizing a pixel-averaging operation in Numpy

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions