Efficient Comparison Of Two Images Using Numpy

Question 1

I want to compare 2 images using numpy. This is what I have got so far. One of the outputs should be a white image with black pixels where pixels are different.

I am sure it's possible to make it more efficient by better use of numpy, e.g. the for loop can be avoided. Or maybe there is a function/package that has implemented something similar already?

 import gc
 import PIL
 import numpy as np
 def compare_images(image_to_test_filename, image_benchmark_filename):
 print('comparing', image_to_test_filename, 'and', image_benchmark_filename)
 image_benchmark = plt.imread(image_benchmark_filename)
 image_to_test = plt.imread(image_to_test_filename)
 assert image_to_test.shape[0] == image_benchmark.shape[0] and image_to_test.shape[1] == image_benchmark.shape[1]
 
 diff_pixel = np.array([0, 0, 0], np.uint8)
 true_array = np.array([True, True, True, True])
 diff_black_white = np.zeros([image_benchmark.shape[0], image_benchmark.shape[1], 3], dtype=np.uint8) + 255
 is_close_pixel_by_pixel = np.isclose(image_to_test, image_benchmark)
 nb_different_rows = 0
 for r, row in enumerate(is_close_pixel_by_pixel):
 diff_indices = [c for c, elem in enumerate(row) if not np.all(elem == true_array)]
 if len(diff_indices):
 diff_black_white[r][diff_indices] = diff_pixel
 nb_different_rows += 1
 dist = np.linalg.norm(image_to_test - image_benchmark) / (image_to_test.shape[0] * image_to_test.shape[1])
 if nb_different_rows > 0:
 print("IS DIFFERERENT! THE DIFFERENCE IS (% OF ALL PIXELS)", dist * 100)
 im = PIL.Image.fromarray(diff_black_white)
 im.save(image_to_test_filename+'_diff.png')
 del im
 del image_benchmark
 del image_to_test
 del diff_black_white
 
 gc.collect()
 return dist, None

Question 2

@Reinderien done!

Question 3

First, this of course depends on your definition of different. I believe right now that your comparison is far too strict, given the defaults for isclose. I did a trivial modification of a .jpg, and with one decode/encode pass it still produced 6% of pixels with an RGB distance of more than 20. isclose applies both a relative and absolute tolerance, but it's probable that for your purposes absolute-only is simpler.

I find f-strings a more natural form of string formatting because the in-line field expressions remove any need of your eyes to scan back and forth between field placeholder and expression. This does not impact performance. Also note the use of % in this context removes the need to divide by 100.

PEP484 type hinting also does not impact performance, but makes the code more legible and verifiable.

Note that it's "almost never" appropriate to del and gc yourself. The garbage collector is there for a reason, and in the vast, vast majority of cases, will act reasonably to free nonreferenced memory without you having to intervene. The one thing you should be doing here that you aren't is moving the images to a with for context management, which will guarantee resource cleanup on scope exit.

This problem is fully vectorizable so should see no explicit loops at all. Just calculate a black-and-white distance from an absolute threshold in one pass. Your original implementation was taking longer to execute than I had patience for, but the following suggestion executes in less than a second:

import numpy as np
from PIL import Image
from matplotlib.pyplot import imread
# Maximum allowable Frobenius distance in RGB space
EPSILON = 20
def compare_images(
 image_to_test_filename: str,
 image_benchmark_filename: str,
) -> float:
 print(f'comparing {image_to_test_filename} and {image_benchmark_filename}')
 image_benchmark = imread(image_benchmark_filename)
 image_to_test = imread(image_to_test_filename)
 assert image_to_test.shape == image_benchmark.shape
 diff_black_white = (
 np.linalg.norm(image_to_test - image_benchmark, axis=2) > EPSILON
 ).astype(np.uint8)
 n_different = np.sum(diff_black_white)
 if n_different > 0:
 diff_fraction = n_different / image_benchmark.size
 print(f'IS DIFFERERENT! THE DIFFERENCE OF ALL PIXELS IS {diff_fraction:.2%}')
 im = Image.fromarray(diff_black_white * 255)
 im.save(f'{image_to_test_filename}_diff.png')
 dist = np.linalg.norm(image_to_test - image_benchmark) / image_benchmark.size
 return dist

Question 4

Thanks! I have found one more actually: image_benchmark.shape[0] * image_benchmark.shape[1] out to be image_benchmark.size, right? (this is 1). )

Question 5

2. Could I ask why you prefer print(f'comparing {i1} and {i2}') over print('comparing', i1, 'and', i2)?

Question 6

3. What would be the benefit of typing the inputs - is it the performance improvement? Is it significant in this case?

Question 7

np.linalg.norm along axis 2 is brilliant, I did not know it can be used like this, thanks!

Question 8

Yes, size is better for this purpose; edited for the other points.

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2021-06-26 19:31:02Z

First, this of course depends on your definition of different. I believe right now that your comparison is far too strict, given the defaults for isclose. I did a trivial modification of a .jpg, and with one decode/encode pass it still produced 6% of pixels with an RGB distance of more than 20. isclose applies both a relative and absolute tolerance, but it's probable that for your purposes absolute-only is simpler.

I find f-strings a more natural form of string formatting because the in-line field expressions remove any need of your eyes to scan back and forth between field placeholder and expression. This does not impact performance. Also note the use of % in this context removes the need to divide by 100.

PEP484 type hinting also does not impact performance, but makes the code more legible and verifiable.

Note that it's "almost never" appropriate to del and gc yourself. The garbage collector is there for a reason, and in the vast, vast majority of cases, will act reasonably to free nonreferenced memory without you having to intervene. The one thing you should be doing here that you aren't is moving the images to a with for context management, which will guarantee resource cleanup on scope exit.

This problem is fully vectorizable so should see no explicit loops at all. Just calculate a black-and-white distance from an absolute threshold in one pass. Your original implementation was taking longer to execute than I had patience for, but the following suggestion executes in less than a second:

import numpy as np
from PIL import Image
from matplotlib.pyplot import imread
# Maximum allowable Frobenius distance in RGB space
EPSILON = 20
def compare_images(
 image_to_test_filename: str,
 image_benchmark_filename: str,
) -> float:
 print(f'comparing {image_to_test_filename} and {image_benchmark_filename}')
 image_benchmark = imread(image_benchmark_filename)
 image_to_test = imread(image_to_test_filename)
 assert image_to_test.shape == image_benchmark.shape
 diff_black_white = (
 np.linalg.norm(image_to_test - image_benchmark, axis=2) > EPSILON
 ).astype(np.uint8)
 n_different = np.sum(diff_black_white)
 if n_different > 0:
 diff_fraction = n_different / image_benchmark.size
 print(f'IS DIFFERERENT! THE DIFFERENCE OF ALL PIXELS IS {diff_fraction:.2%}')
 im = Image.fromarray(diff_black_white * 255)
 im.save(f'{image_to_test_filename}_diff.png')
 dist = np.linalg.norm(image_to_test - image_benchmark) / image_benchmark.size
 return dist

Thanks! I have found one more actually: image_benchmark.shape[0] * image_benchmark.shape[1] out to be image_benchmark.size, right? (this is 1). )
2. Could I ask why you prefer print(f'comparing {i1} and {i2}') over print('comparing', i1, 'and', i2)?
3. What would be the benefit of typing the inputs - is it the performance improvement? Is it significant in this case?
np.linalg.norm along axis 2 is brilliant, I did not know it can be used like this, thanks!
Yes, size is better for this purpose; edited for the other points.

Stack Exchange Network

Efficient Comparison Of Two Images Using Numpy

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Efficient Comparison Of Two Images Using Numpy

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions