I'm trying to produce a 3D point cloud from a depth image and some camera intrinsics. The image is 640x480, and is a NumPy array of bytes. The output is a (rows * columns) x 3 array of points.
I've gotten the function to work perfectly, but it's way too slow! (takes like 2 seconds per image to process). I'm wondering if there are any optimizations I can make before giving up and writing a C module.
def create_point_cloud(self, depth_image):
shape = depth_image.shape;
rows = shape[0];
cols = shape[1];
points = np.zeros((rows * cols, 3), np.float32);
bytes_to_units = (1.0 / 256.0);
# Linear iterator for convenience
i = 0
# For each pixel in the image...
for r in xrange(0, rows):
for c in xrange(0, cols):
# Get the depth in bytes
depth = depth_image[r, c, 0];
# If the depth is 0x0 or 0xFF, its invalid.
# By convention it should be replaced by a NaN depth.
if(depth > 0 and depth < 255):
# The true depth of the pixel in units
z = depth * bytes_to_units;
# Get the x, y, z coordinates in units of the pixel
points[i, 0] = (c - self.cx) / self.fx * z;
points[i, 1] = (r - self.cy) / self.fy * z;
points[i, 2] = z
else:
# Invalid points have a NaN depth
points[i, 2] = np.nan;
i = i + 1
return points
2 Answers 2
1. Review
No docstring. What does this function do? What parameters does it take? What shape must the
depth_image
parameter be? What does it return?In Python, there is no need for a semi-colon at the end of a statement (unless another statement follows on the same line) and it is best to omit it.
It seems that
depth_image
is required to have three dimensions, but only the coordinate 0 is used on the third dimension. It would be simpler if the function took a two-dimensional depth image. A caller with a three-dimensional image could passdepth_image[...,0]
.The result is returned as a linear array rather than in the natural shape as a cols ×ばつ rows ×ばつ 3 array. This loses information about the shape of the image that might be needed by some callers. There's a comment that says this is "for convenience" but if so, the caller can easily call
numpy.reshape
.
2. Vectorize
As always with Numpy performance problems, an important step is to scrutinize all the loops that run in the Python interpreter (the for
and while
loops) to see if they can be vectorized. A loop running inside Numpy is usually hundreds of times faster than the same loop running in native Python.
In this case it should be straightforward: instead of looping over pixels and doing some operations on each pixel, apply those operations to the whole image and let Numpy worry about looping over the pixels.
We only need the 0 plane on the third axis, as discussed in §1.3 above:
depth = depth_image[..., 0]
Let's make a mask array of valid pixels:
valid = (depth > 0) & (depth < 255)
The z-coordinates of the result can be computed by using numpy.where
to distinguish between valid and invalid pixels:
z = np.where(valid, depth / 256.0, np.nan)
For the other coordinates of the result, we need row and column coordinates for each pixel, which can be generated using numpy.meshgrid
:
rows, cols = depth_image.shape
c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)
Then we have:
x = np.where(valid, z * (c - self.cx) / self.fx, 0)
y = np.where(valid, z * (r - self.cy) / self.fy, 0)
The coordinate arrays can be stacked using numpy.dstack
to produce the result:
return np.dstack((x, y, z))
If the caller really needs a linear array, they can flatten it using numpy.reshape
.
3. Revised code
Putting this together:
def point_cloud(self, depth):
"""Transform a depth image into a point cloud with one point for each
pixel in the image, using the camera transform for a camera
centred at cx, cy with field of view fx, fy.
depth is a 2-D ndarray with shape (rows, cols) containing
depths from 1 to 254 inclusive. The result is a 3-D array with
shape (rows, cols, 3). Pixels with invalid depth in the input have
NaN for the z-coordinate in the result.
"""
rows, cols = depth.shape
c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)
valid = (depth > 0) & (depth < 255)
z = np.where(valid, depth / 256.0, np.nan)
x = np.where(valid, z * (c - self.cx) / self.fx, 0)
y = np.where(valid, z * (r - self.cy) / self.fy, 0)
return np.dstack((x, y, z))
I find that this is about fifty times as fast as the original code.
-
1\$\begingroup\$ Is there a resource to see why the math of your code is derived, i.e. the general math of how to derive a point cloud from a depth image given the camera position and field of view. Also are the values c_x, c_y the position of the camera in world space/ or are they the position of the centre of the image being focused on in world space. \$\endgroup\$IntegrateThis– IntegrateThis2020年09月11日 20:29:40 +00:00Commented Sep 11, 2020 at 20:29
Maybe you can decorate your function the way it is with Numba and get it compiled just-in-time. Here is an example blog post that compares numba performance with other techniques: https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/
note that numba is sometimes faster than vectorizing the code if you are doing just math operations. I recommend using the Anaconda python distribution. Trying to install Numba yourself can be a pain (well at least it was for me the last time I tried).
Explore related questions
See similar questions with these tags.