I am reading from a file containing some segments (irregular parcels of the image) and trying to average the entire segment to have one pixel value. This is the code I use:
band = band[:,:,0] #take the first band of the image
for i in range(numSegments): #for every segment
tx = band[segments==i] #select all the pixels in segment
avg = np.average(tx) #average the values
band[segments==i] = avg #write the average back into the image
I am omiting some transformation steps and code for printing running time from the snippet.
This takes quite sometime to run for even one band. Almost 1000 seconds. I was wondering if there is a way to vectorize this operation to make it faster?
Data:
Segment2009
: an image of all the segments in the image.
This is what the segments look like:
Bands
: 3000x3000 pixels, float32
values.
Full context:
workFolder = '/home/shaunak/Work/ChangeDet_2016/SLIC/003_lee_m0_alpha'
bandlist=os.path.join(workFolder,'bandlist.txt')
configfile = os.path.join(workFolder,'config.txt')
segmentfile = os.path.join(workFolder,'Segments2009')
#%% Load the bands -- can refer to subfolders in the bandlist
files = utilities.readBandList(bandlist)
destinations = []
for f in files:
destinations.append(f.split('.')[0]+"_SP."+f.split('.')[1])
(lines,samples,bands) = utilities.readConfigImSizeBand(configfile)
#%% Superpixel file
segments = np.fromfile(segmentfile,dtype='float32')
segments = np.reshape(segments,(lines,samples))
numSegments = int(np.max(segments))
#%% simple avg
for idx,f in enumerate(files):
band = np.fromfile(f,dtype='float32').reshape((lines,samples))
start = time.time()
for i in range(numSegments):
tx = band[segments==i]
avg = np.average(tx)
band[segments==i] = avg
band.tofile(destinations[idx])
I am writing the values back to the original after averaging. It is not necessary, also not the most expensive part -- and helps me visualize the results better so I kept it in. I used the following approach also:
avgOut = np.zeros((numSegments,bands),dtype='float32')
#avgOutJoined = np.zeros((lines,samples,bands),dtype='float32')
for i in range(numSegments):
tx = band[segments==i]
avgOut[i,:] = np.average(tx,axis=0)
# avgOutJoined[segments==i,:] = np.average(tx,axis=0)
np.tofile(outputSeperated,avgOut)
#np.tofile(outputJoined,avgOutJoined)
Since not writing the averaged result back did not save much time, I kept it in.
2 Answers 2
The function scipy.ndimage.measurements.mean
takes a labelled array and computes the mean of the values at each label. So instead of:
for i in range(numSegments):
tx = band[segments==i]
avg = np.average(tx)
band[segments==i] = avg
you can write:
segment_mean = scipy.ndimage.measurements.mean(band, segments, range(numSegments))
band = segment_mean[segments]
Though I should add that the last operation here (the reconstruction of band
) seems quite wasteful to me: all the information you need is already in the array segments
and the array segment_mean
(which has just one value per segment). Why do you then need to reconstruct the full array, filling each segment with its mean value? Could you not refactor the subsequent processing to use segments
and segment_mean
directly?
Update: you clarified the question to explain that writing the mean values back to band
was just for visualization and is not an essential part of your application. In that case, you just need the one call to scipy.ndimage.measurements.mean
.
-
\$\begingroup\$ Yes. I added my full code to also show what else I tried. Even doing it without the reconstruction was quite time consuming. Will give what you wrote a try. \$\endgroup\$shaunakde– shaunakde2016年10月28日 14:46:58 +00:00Commented Oct 28, 2016 at 14:46
-
\$\begingroup\$ Down from hundreds of seconds to a couple, thanks! i would love to explore how this function works to speed this up. \$\endgroup\$shaunakde– shaunakde2016年10月28日 15:52:56 +00:00Commented Oct 28, 2016 at 15:52
-
\$\begingroup\$ @shaunakde: Works for me, but obviously I have a different
segments
array. What is the error? \$\endgroup\$Gareth Rees– Gareth Rees2016年10月28日 16:03:32 +00:00Commented Oct 28, 2016 at 16:03 -
\$\begingroup\$ Nevermind. It was my stupidity. Thanks a lot for everything! \$\endgroup\$shaunakde– shaunakde2016年10月28日 16:11:25 +00:00Commented Oct 28, 2016 at 16:11
Try replacing
band[segments==i] = avg
With
tx[:] = avg
Eliminating a redundant masking operation.
Does the shape of tx
differ with i
? In other words tell us something about segments==i
. Can you precalculate it for all i
, so you don't have to repeat it for each band1 file? Maybe apply
where` so it's indexes rather than boolean mask?
-
\$\begingroup\$ Each segment has a different number of pixels, so this solution will not work \$\endgroup\$shaunakde– shaunakde2016年10月28日 14:53:56 +00:00Commented Oct 28, 2016 at 14:53
np.apply_along_axis(np.average, 3, band)
. But is it trully your whole code? I mean, there is lack of context here. How can you be sure this is your bottleneck? \$\endgroup\$apply_along_axis
under the covers, uses Python iteration. It just make iteration over multiple axes (all but the chosen one) easier. \$\endgroup\$