Generating a 3D Point Cloud

Question 1

I'm trying to produce a 3D point cloud from a depth image and some camera intrinsics. The image is 640x480, and is a NumPy array of bytes. The output is a (rows * columns) x 3 array of points.

I've gotten the function to work perfectly, but it's way too slow! (takes like 2 seconds per image to process). I'm wondering if there are any optimizations I can make before giving up and writing a C module.

def create_point_cloud(self, depth_image):
 shape = depth_image.shape;
 rows = shape[0];
 cols = shape[1];
 points = np.zeros((rows * cols, 3), np.float32);
 bytes_to_units = (1.0 / 256.0);
 # Linear iterator for convenience
 i = 0
 # For each pixel in the image...
 for r in xrange(0, rows):
 for c in xrange(0, cols):
 # Get the depth in bytes
 depth = depth_image[r, c, 0];
 # If the depth is 0x0 or 0xFF, its invalid.
 # By convention it should be replaced by a NaN depth.
 if(depth > 0 and depth < 255):
 # The true depth of the pixel in units
 z = depth * bytes_to_units;
 # Get the x, y, z coordinates in units of the pixel
 points[i, 0] = (c - self.cx) / self.fx * z;
 points[i, 1] = (r - self.cy) / self.fy * z;
 points[i, 2] = z
 else:
 # Invalid points have a NaN depth
 points[i, 2] = np.nan;
 i = i + 1
 return points

Question 2

1. Review

No docstring. What does this function do? What parameters does it take? What shape must the depth_image parameter be? What does it return?
In Python, there is no need for a semi-colon at the end of a statement (unless another statement follows on the same line) and it is best to omit it.
It seems that depth_image is required to have three dimensions, but only the coordinate 0 is used on the third dimension. It would be simpler if the function took a two-dimensional depth image. A caller with a three-dimensional image could pass depth_image[...,0].
The result is returned as a linear array rather than in the natural shape as a cols ×ばつ rows ×ばつ 3 array. This loses information about the shape of the image that might be needed by some callers. There's a comment that says this is "for convenience" but if so, the caller can easily call numpy.reshape.

2. Vectorize

As always with Numpy performance problems, an important step is to scrutinize all the loops that run in the Python interpreter (the for and while loops) to see if they can be vectorized. A loop running inside Numpy is usually hundreds of times faster than the same loop running in native Python.

In this case it should be straightforward: instead of looping over pixels and doing some operations on each pixel, apply those operations to the whole image and let Numpy worry about looping over the pixels.

We only need the 0 plane on the third axis, as discussed in §1.3 above:

depth = depth_image[..., 0]

Let's make a mask array of valid pixels:

valid = (depth > 0) & (depth < 255)

The z-coordinates of the result can be computed by using numpy.where to distinguish between valid and invalid pixels:

z = np.where(valid, depth / 256.0, np.nan)

For the other coordinates of the result, we need row and column coordinates for each pixel, which can be generated using numpy.meshgrid:

rows, cols = depth_image.shape
c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)

Then we have:

x = np.where(valid, z * (c - self.cx) / self.fx, 0)
y = np.where(valid, z * (r - self.cy) / self.fy, 0)

The coordinate arrays can be stacked using numpy.dstack to produce the result:

return np.dstack((x, y, z))

If the caller really needs a linear array, they can flatten it using numpy.reshape.

3. Revised code

Putting this together:

def point_cloud(self, depth):
 """Transform a depth image into a point cloud with one point for each
 pixel in the image, using the camera transform for a camera
 centred at cx, cy with field of view fx, fy.
 depth is a 2-D ndarray with shape (rows, cols) containing
 depths from 1 to 254 inclusive. The result is a 3-D array with
 shape (rows, cols, 3). Pixels with invalid depth in the input have
 NaN for the z-coordinate in the result.
 """
 rows, cols = depth.shape
 c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)
 valid = (depth > 0) & (depth < 255)
 z = np.where(valid, depth / 256.0, np.nan)
 x = np.where(valid, z * (c - self.cx) / self.fx, 0)
 y = np.where(valid, z * (r - self.cy) / self.fy, 0)
 return np.dstack((x, y, z))

I find that this is about fifty times as fast as the original code.

Question 3

Is there a resource to see why the math of your code is derived, i.e. the general math of how to derive a point cloud from a depth image given the camera position and field of view. Also are the values c_x, c_y the position of the camera in world space/ or are they the position of the centre of the image being focused on in world space.

Question 4

Maybe you can decorate your function the way it is with Numba and get it compiled just-in-time. Here is an example blog post that compares numba performance with other techniques: https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/

note that numba is sometimes faster than vectorizing the code if you are doing just math operations. I recommend using the Anaconda python distribution. Trying to install Numba yourself can be a pain (well at least it was for me the last time I tried).

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Answer 1 · 2015-03-25 15:40:17Z

1. Review

No docstring. What does this function do? What parameters does it take? What shape must the depth_image parameter be? What does it return?
In Python, there is no need for a semi-colon at the end of a statement (unless another statement follows on the same line) and it is best to omit it.
It seems that depth_image is required to have three dimensions, but only the coordinate 0 is used on the third dimension. It would be simpler if the function took a two-dimensional depth image. A caller with a three-dimensional image could pass depth_image[...,0].
The result is returned as a linear array rather than in the natural shape as a cols ×ばつ rows ×ばつ 3 array. This loses information about the shape of the image that might be needed by some callers. There's a comment that says this is "for convenience" but if so, the caller can easily call numpy.reshape.

2. Vectorize

As always with Numpy performance problems, an important step is to scrutinize all the loops that run in the Python interpreter (the for and while loops) to see if they can be vectorized. A loop running inside Numpy is usually hundreds of times faster than the same loop running in native Python.

In this case it should be straightforward: instead of looping over pixels and doing some operations on each pixel, apply those operations to the whole image and let Numpy worry about looping over the pixels.

We only need the 0 plane on the third axis, as discussed in §1.3 above:

depth = depth_image[..., 0]

Let's make a mask array of valid pixels:

valid = (depth > 0) & (depth < 255)

The z-coordinates of the result can be computed by using numpy.where to distinguish between valid and invalid pixels:

z = np.where(valid, depth / 256.0, np.nan)

For the other coordinates of the result, we need row and column coordinates for each pixel, which can be generated using numpy.meshgrid:

rows, cols = depth_image.shape
c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)

Then we have:

x = np.where(valid, z * (c - self.cx) / self.fx, 0)
y = np.where(valid, z * (r - self.cy) / self.fy, 0)

The coordinate arrays can be stacked using numpy.dstack to produce the result:

return np.dstack((x, y, z))

If the caller really needs a linear array, they can flatten it using numpy.reshape.

3. Revised code

Putting this together:

def point_cloud(self, depth):
 """Transform a depth image into a point cloud with one point for each
 pixel in the image, using the camera transform for a camera
 centred at cx, cy with field of view fx, fy.
 depth is a 2-D ndarray with shape (rows, cols) containing
 depths from 1 to 254 inclusive. The result is a 3-D array with
 shape (rows, cols, 3). Pixels with invalid depth in the input have
 NaN for the z-coordinate in the result.
 """
 rows, cols = depth.shape
 c, r = np.meshgrid(np.arange(cols), np.arange(rows), sparse=True)
 valid = (depth > 0) & (depth < 255)
 z = np.where(valid, depth / 256.0, np.nan)
 x = np.where(valid, z * (c - self.cx) / self.fx, 0)
 y = np.where(valid, z * (r - self.cy) / self.fy, 0)
 return np.dstack((x, y, z))

I find that this is about fifty times as fast as the original code.

Is there a resource to see why the math of your code is derived, i.e. the general math of how to derive a point cloud from a depth image given the camera position and field of view. Also are the values c_x, c_y the position of the camera in world space/ or are they the position of the centre of the image being focused on in world space.

ejr ejr 311 bronze badge · Answer 2 · 2016-09-24 16:31:37Z

Maybe you can decorate your function the way it is with Numba and get it compiled just-in-time. Here is an example blog post that compares numba performance with other techniques: https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/

note that numba is sometimes faster than vectorizing the code if you are doing just math operations. I recommend using the Anaconda python distribution. Trying to install Numba yourself can be a pain (well at least it was for me the last time I tried).

Stack Exchange Network

Generating a 3D Point Cloud

2 Answers 2

1. Review

2. Vectorize

3. Revised code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Generating a 3D Point Cloud

2 Answers 2

1. Review

2. Vectorize

3. Revised code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions