Problem statement: Assume a high resolution (> 3000 x 3000) image is given as input. The image pixels can be classified into one of the three categories namely text, background and drawing. There is library function which takes a pixel and returns its category. Write a function which takes high resolution image as input and return another image of same resolution where all text pixels are red, background pixels are green and drawing pixels are blue.
Approach: I have currently coded a brute force solution, where I iterate over each pixel in two nested for loops and invoke library method to know its category and accordingly set the colour of the pixel. Functionality wise it runs fine but it is hell slow.
Review ask: How to improve performance? How can I use vectorize this operation? I am currently using opencv but can improve any other library to get performance gain.
def generate_image_label(input_image_path, output_image_path):
try:
print("Processing image " + input_image_path)
image = cv2.imread(input_image_path, cv2.IMREAD_UNCHANGED)
image_width, image_height, image_channels = image.shape
c_b, c_g, c_r, c_a = cv2.split(image)
for i in range(image_width):
for j in range(image_height):
drawing_pixel = is_drawing_pixel(image, j, i) # is_drawing_pixel comes from some other module
text_pixel = is_text_pixel(image, j, i) # is_text_pixel comes from some other module
if c_a[i][j] != 0 and drawing_pixel:
c_b[i][j] = 255
c_g[i][j] = 0
c_r[i][j] = 0
c_a[i][j] = 255
elif c_a[i][j] != 0 and text_pixel:
c_b[i][j] = 0
c_g[i][j] = 0
c_r[i][j] = 255
c_a[i][j] = 255
else:
c_b[i][j] = 0
c_g[i][j] = 255
c_r[i][j] = 0
c_a[i][j] = 255
img_label = cv2.merge((c_b, c_g, c_r, c_a))
cv2.imwrite(os.path.join(output_image_path, os.path.basename(input_image_path)), img_label)
return (True, input_image_path)
except:
return (False, input_image_path)
1 Answer 1
To vectorize code like this we need to know what is_drawing_pixel
and is_text_pixel
do. Can these be called with many pixels as input? If they need to be called for a single pixel, then there is no way to vectorize this, because you must call the two functions for each pixel. The only obvious speed gain is to not call is_text_pixel
if is_drawing_pixel
returned true. If you time these two functions, and determine one is faster than the other, then you can run the faster function first, and avoid calling the slower one if you don't need to.
Because you always set transparent pixels to green, don't call your pixel classification functions for transparent pixels.
You should avoid using cv2.split
, it is not at all necessary, and just complicates your code. If image_channels==4
, then you can do image[i][j] = [255,0,0,255]
, or equivalently image[i,j,:] = [255,0,0,255]
. I personally prefer the second form, I find it more intuitive. I have the idea that it's also more efficient but I don't know for sure. If we don't create the copies to modify using cv2.split
, we should create an output image to modify in the loop: I'm pretty sure your pixel classification functions read at least a neighborhood of the pixel they are classifying, so we don't want to modify it.
If you initialize out
to [0,255,0,255]
, then you can additionally skip the last else
statement.
It is bad practice to catch exceptions and return an error status. You should use the exception system for error handling. If your function encounters an error, it should raise an exception. It is the calling function that should catch the exception, if it needs to, and attempt to recover from the error. Thus, you should just not catch the exceptions at all.
cv2.imread
will return None
if it fails to read the image. You should always test for this case and handle the error appropriately. If OpenCV did error handling properly like Python expects, it would raise an exception and you wouldn't have to worry about it. But because it returns an error status instead, you always have to check the error status and handle it if necessary. It is best to raise an exception when cv2.imread
returns None
.
The code ends up being something like this (obviously not tested, as I don't have access to the pixel testing functions):
def generate_image_label(input_image_path, output_image_path):
print("Processing image " + input_image_path)
image = cv2.imread(input_image_path, cv2.IMREAD_UNCHANGED)
if not image:
raise RuntimeError('Could not load image')
if image.ndim != 3
raise RuntimeError('Cannot process gray-scale images')
if image.shape(2) == 3:
# Add an alpha channel if we don't have one
image = np.pad(image, ((0,0),(0,0),(0,1)), constant_values=255)
assert(image.shape(2) == 4)
out = np.zeros(image.shape, dtype=np.int8)
out[:,:,1] = 255 # default color is green
out[:,:,3] = 255 # all pixels have alpha = 255
for i in range(image_width):
for j in range(image_height):
if image[i,j,3] != 0:
if is_drawing_pixel(image, j, i):
out[i,j,:] = [255,0,0,255]
elif is_text_pixel(image, j, i):
out[i,j,:] = [0,0,255,255]
cv2.imwrite(os.path.join(output_image_path, os.path.basename(input_image_path)), out)