Update image pixels based on different criteria

Question 1

Problem statement: Assume a high resolution (> 3000 x 3000) image is given as input. The image pixels can be classified into one of the three categories namely text, background and drawing. There is library function which takes a pixel and returns its category. Write a function which takes high resolution image as input and return another image of same resolution where all text pixels are red, background pixels are green and drawing pixels are blue.

Approach: I have currently coded a brute force solution, where I iterate over each pixel in two nested for loops and invoke library method to know its category and accordingly set the colour of the pixel. Functionality wise it runs fine but it is hell slow.

Review ask: How to improve performance? How can I use vectorize this operation? I am currently using opencv but can improve any other library to get performance gain.

def generate_image_label(input_image_path, output_image_path):
 try: 
 print("Processing image " + input_image_path)
 image = cv2.imread(input_image_path, cv2.IMREAD_UNCHANGED)
 image_width, image_height, image_channels = image.shape 
 c_b, c_g, c_r, c_a = cv2.split(image)
 
 for i in range(image_width):
 for j in range(image_height):
 drawing_pixel = is_drawing_pixel(image, j, i) # is_drawing_pixel comes from some other module
 text_pixel = is_text_pixel(image, j, i) # is_text_pixel comes from some other module 
 if c_a[i][j] != 0 and drawing_pixel:
 c_b[i][j] = 255
 c_g[i][j] = 0
 c_r[i][j] = 0
 c_a[i][j] = 255
 elif c_a[i][j] != 0 and text_pixel:
 c_b[i][j] = 0
 c_g[i][j] = 0
 c_r[i][j] = 255
 c_a[i][j] = 255 
 else:
 c_b[i][j] = 0
 c_g[i][j] = 255
 c_r[i][j] = 0
 c_a[i][j] = 255 
 img_label = cv2.merge((c_b, c_g, c_r, c_a))
 cv2.imwrite(os.path.join(output_image_path, os.path.basename(input_image_path)), img_label)
 return (True, input_image_path)
 except:
 return (False, input_image_path)

Question 2

To vectorize code like this we need to know what is_drawing_pixel and is_text_pixel do. Can these be called with many pixels as input? If they need to be called for a single pixel, then there is no way to vectorize this, because you must call the two functions for each pixel. The only obvious speed gain is to not call is_text_pixel if is_drawing_pixel returned true. If you time these two functions, and determine one is faster than the other, then you can run the faster function first, and avoid calling the slower one if you don't need to.

Because you always set transparent pixels to green, don't call your pixel classification functions for transparent pixels.

You should avoid using cv2.split, it is not at all necessary, and just complicates your code. If image_channels==4, then you can do image[i][j] = [255,0,0,255], or equivalently image[i,j,:] = [255,0,0,255]. I personally prefer the second form, I find it more intuitive. I have the idea that it's also more efficient but I don't know for sure. If we don't create the copies to modify using cv2.split, we should create an output image to modify in the loop: I'm pretty sure your pixel classification functions read at least a neighborhood of the pixel they are classifying, so we don't want to modify it.

If you initialize out to [0,255,0,255], then you can additionally skip the last else statement.

It is bad practice to catch exceptions and return an error status. You should use the exception system for error handling. If your function encounters an error, it should raise an exception. It is the calling function that should catch the exception, if it needs to, and attempt to recover from the error. Thus, you should just not catch the exceptions at all.

cv2.imread will return None if it fails to read the image. You should always test for this case and handle the error appropriately. If OpenCV did error handling properly like Python expects, it would raise an exception and you wouldn't have to worry about it. But because it returns an error status instead, you always have to check the error status and handle it if necessary. It is best to raise an exception when cv2.imread returns None.

The code ends up being something like this (obviously not tested, as I don't have access to the pixel testing functions):

def generate_image_label(input_image_path, output_image_path):
 print("Processing image " + input_image_path)
 image = cv2.imread(input_image_path, cv2.IMREAD_UNCHANGED)
 if not image:
 raise RuntimeError('Could not load image')
 if image.ndim != 3
 raise RuntimeError('Cannot process gray-scale images')
 if image.shape(2) == 3:
 # Add an alpha channel if we don't have one
 image = np.pad(image, ((0,0),(0,0),(0,1)), constant_values=255)
 assert(image.shape(2) == 4)
 out = np.zeros(image.shape, dtype=np.int8)
 out[:,:,1] = 255 # default color is green
 out[:,:,3] = 255 # all pixels have alpha = 255
 for i in range(image_width):
 for j in range(image_height):
 if image[i,j,3] != 0:
 if is_drawing_pixel(image, j, i):
 out[i,j,:] = [255,0,0,255]
 elif is_text_pixel(image, j, i):
 out[i,j,:] = [0,0,255,255]
 cv2.imwrite(os.path.join(output_image_path, os.path.basename(input_image_path)), out)

Cris Luengo Cris Luengo 6,9811 gold badge14 silver badges37 bronze badges · Accepted Answer · 2021-08-25 17:21:47Z

To vectorize code like this we need to know what is_drawing_pixel and is_text_pixel do. Can these be called with many pixels as input? If they need to be called for a single pixel, then there is no way to vectorize this, because you must call the two functions for each pixel. The only obvious speed gain is to not call is_text_pixel if is_drawing_pixel returned true. If you time these two functions, and determine one is faster than the other, then you can run the faster function first, and avoid calling the slower one if you don't need to.

Because you always set transparent pixels to green, don't call your pixel classification functions for transparent pixels.

You should avoid using cv2.split, it is not at all necessary, and just complicates your code. If image_channels==4, then you can do image[i][j] = [255,0,0,255], or equivalently image[i,j,:] = [255,0,0,255]. I personally prefer the second form, I find it more intuitive. I have the idea that it's also more efficient but I don't know for sure. If we don't create the copies to modify using cv2.split, we should create an output image to modify in the loop: I'm pretty sure your pixel classification functions read at least a neighborhood of the pixel they are classifying, so we don't want to modify it.

If you initialize out to [0,255,0,255], then you can additionally skip the last else statement.

It is bad practice to catch exceptions and return an error status. You should use the exception system for error handling. If your function encounters an error, it should raise an exception. It is the calling function that should catch the exception, if it needs to, and attempt to recover from the error. Thus, you should just not catch the exceptions at all.

cv2.imread will return None if it fails to read the image. You should always test for this case and handle the error appropriately. If OpenCV did error handling properly like Python expects, it would raise an exception and you wouldn't have to worry about it. But because it returns an error status instead, you always have to check the error status and handle it if necessary. It is best to raise an exception when cv2.imread returns None.

The code ends up being something like this (obviously not tested, as I don't have access to the pixel testing functions):

def generate_image_label(input_image_path, output_image_path):
 print("Processing image " + input_image_path)
 image = cv2.imread(input_image_path, cv2.IMREAD_UNCHANGED)
 if not image:
 raise RuntimeError('Could not load image')
 if image.ndim != 3
 raise RuntimeError('Cannot process gray-scale images')
 if image.shape(2) == 3:
 # Add an alpha channel if we don't have one
 image = np.pad(image, ((0,0),(0,0),(0,1)), constant_values=255)
 assert(image.shape(2) == 4)
 out = np.zeros(image.shape, dtype=np.int8)
 out[:,:,1] = 255 # default color is green
 out[:,:,3] = 255 # all pixels have alpha = 255
 for i in range(image_width):
 for j in range(image_height):
 if image[i,j,3] != 0:
 if is_drawing_pixel(image, j, i):
 out[i,j,:] = [255,0,0,255]
 elif is_text_pixel(image, j, i):
 out[i,j,:] = [0,0,255,255]
 cv2.imwrite(os.path.join(output_image_path, os.path.basename(input_image_path)), out)

Stack Exchange Network

Update image pixels based on different criteria

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Update image pixels based on different criteria

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions