Here is a function that I've only slightly modified from its original context, found here.
Before mentioning anything else, it should be noted that I'm desperately trying to optimize this code for speed. It presently takes about 5.25 seconds to execute and it appears as though the bottleneck is happening in the for
-loop.
In a nutshell, this function expects the user to have SimpleCV
installed and expects, at a minimum, to be passed a SimpleCV.Image
instance.
Does anybody have some clever approach for speeding things up? Ideally I'd be able to run this on a real-time webcam feed at 30 frames-per-second, but I'm not getting my hopes up.
from itertools import product
from math import floor, pi
import numpy as np
import cv2 # opencv 2
def findHOGFeatures(img, n_divs=6, n_bins=6):
"""
**SUMMARY**
Get HOG(Histogram of Oriented Gradients) features from the image.
**PARAMETERS**
* *img* - SimpleCV.Image instance
* *n_divs* - the number of divisions(cells).
* *n_divs* - the number of orientation bins.
**RETURNS**
Returns the HOG vector in a numpy array
"""
# Size of HOG vector
n_HOG = n_divs * n_divs * n_bins
# Initialize output HOG vector
# HOG = [0.0]*n_HOG
HOG = np.zeros((n_HOG, 1))
# Apply sobel on image to find x and y orientations of the image
Icv = img.getNumpyCv2()
Ix = cv2.Sobel(Icv, ddepth=cv.CV_32F, dx=1, dy=0, ksize=3)
Iy = cv2.Sobel(Icv, ddepth=cv.CV_32F, dx=0, dy=1, ksize=3)
Ix = Ix.transpose(1, 0, 2)
Iy = Iy.transpose(1, 0, 2)
cellx = img.width / n_divs # width of each cell(division)
celly = img.height / n_divs # height of each cell(division)
#Area of image
img_area = img.height * img.width
#Range of each bin
BIN_RANGE = (2 * pi) / n_bins
# m = 0
angles = np.arctan2(Iy, Ix)
magnit = ((Ix ** 2) + (Iy ** 2)) ** 0.5
it = product(xrange(n_divs), xrange(n_divs), xrange(cellx), xrange(celly))
for m, n, i, j in it:
# grad value
grad = magnit[m * cellx + i, n * celly + j][0]
# normalized grad value
norm_grad = grad / img_area
# Orientation Angle
angle = angles[m*cellx + i, n*celly+j][0]
# (-pi,pi) to (0, 2*pi)
if angle < 0:
angle += 2 * pi
nth_bin = floor(float(angle/BIN_RANGE))
HOG[((m * n_divs + n) * n_bins + int(nth_bin))] += norm_grad
return HOG.transpose()
1 Answer 1
As you indicated in the question, you need to vectorize the for
loop:
it = product(xrange(n_divs), xrange(n_divs), xrange(cellx), xrange(celly))
for m, n, i, j in it:
# grad value
grad = magnit[m * cellx + i, n * celly + j][0]
# normalized grad value
norm_grad = grad / img_area
# Orientation Angle
angle = angles[m*cellx + i, n*celly+j][0]
# (-pi,pi) to (0, 2*pi)
if angle < 0:
angle += 2 * pi
nth_bin = floor(float(angle/BIN_RANGE))
HOG[((m * n_divs + n) * n_bins + int(nth_bin))] += norm_grad
If you look at what this is doing, you're effectively labelling every pixel in the magnit
array with a number below n_HOG
, and then summing the normalized values for the pixels with each label.
Operations on labelled regions of images are jobs for the scipy.ndimage.measurements
module. Here we can use scipy.ndimage.measurements.sum
:
bins = (angles[...,0] % (2 * pi) / BIN_RANGE).astype(int)
x, y = np.mgrid[:width, :height]
x = x * n_divs // width
y = y * n_divs // height
labels = (x * n_divs + y) * n_bins + bins
index = np.arange(n_HOG)
HOG = scipy.ndimage.measurements.sum(magnit[...,0], labels, index)
return HOG / img_area
Notes:
I've used
% (2 * pi)
to get the angles in the range [0, 2π). An alternative that's more like your code would beangles[angles < 0] += 2 * pi
but using the modulus is shorter and, I think, clearer.I postponed the division by
img_area
until after the summation, because it looks to me as though in the common casen_HOG
is much less thanimg_area
and so it's cheaper to do the division later when there are fewer items. (This means that the results differ very slightly from your code, so bear that in mind when checking.)I measure the vectorized version as being about 60 times faster than your
for
loop, but it's still not going to be fast enough to run at 30 fps!I've written
angles[...,0]
andmagnit[...,0]
here in order to drop the third axis. But I think it would make more sense if you dropped this axis earlier, before computingangles
andmagnit
, by writingIx = Ix[...,0]
or justIx = Ix.reshape((height, width))
if you know that the last axis has length 1.
Update
Based on comments, it looks as if you are using Python 2.7, where the division operator /
takes the floor of the result if both arguments are integers. So I've changed the code above to use:
x = x * n_divs // width
y = y * n_divs // height
which is portable between Python 2 and Python 3, and simpler than my first attempt:
x = (x / width * n_divs).astype(int)
y = (y / height * n_divs).astype(int)
-
\$\begingroup\$ Fantastic! Thank you so much! I'll give this a whirl later on today and come back with questions if I have any. \$\endgroup\$Louis Thibault– Louis Thibault2014年02月27日 07:00:40 +00:00Commented Feb 27, 2014 at 7:00
-
\$\begingroup\$ I've tried running your example and I'm getting zero values for all but the first 6 cells of the output array. Would you mind checking my implementation? I'm not quite sure what the problem could be. The vectorized approach is implemented in the function named
findHOGFeaturesVect
: paste.ubuntu.com/7004183 \$\endgroup\$Louis Thibault– Louis Thibault2014年02月27日 09:44:29 +00:00Commented Feb 27, 2014 at 9:44 -
\$\begingroup\$ Are you using Python 2? If so, see the revised answer. \$\endgroup\$Gareth Rees– Gareth Rees2014年02月27日 10:52:15 +00:00Commented Feb 27, 2014 at 10:52
-
\$\begingroup\$ @blz can you copy the final working optimized code as an answer here? \$\endgroup\$waspinator– waspinator2014年08月26日 19:10:32 +00:00Commented Aug 26, 2014 at 19:10
Explore related questions
See similar questions with these tags.
import
statements so that this is runnable, please? \$\endgroup\$SimpleCV
and create anImage
instance to pass to the function. I suggest doing so as follows:img = SimpleCV.Image('lenna')
. Also note, that doingfrom SimpleCV import *
will handle all imports correctly. \$\endgroup\$%prun
magic -- I should have mentioned that. The reason I point directly to the for-loop iteration is because the function responsible for the most cumulative execution time ismath.floor
, which is called exactly once per loop, and the number of calls to said function matches the number of loop iterations. I'll try GarethRees' vectorization approach and post a more detailed profiler output if needed. \$\endgroup\$