I wrote a program for analyzing pictures. The problem I'm having is very slow processing time for medium images like 800x800. I think the root of the problem for this is the for
loop where I complete my NumPy array with values. The main task for the program is to count how many times x intensity shows in each color channel and then plot them in a histogram. For example, in the end we will be able to see how many times do we get intensity 200 in red channel and so on.
My code is probably very hard to read but I tried to add comments to make things easier.
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
#Load an image
fname = 'picture1.jpg'
image_grey = Image.open(fname).convert("L")
image_rgb = Image.open(fname).convert("RGB")
arrey_grey = np.asarray(image_grey)
arrey_rgb = np.asarray(image_rgb)
#Get image size
with Image.open('picture1.jpg') as img:
width, height = img.size
size = width * height
#Plot the uploaded image both grey and rgb
plt.imshow(arrey_grey, cmap='gray')
plt.show()
plt.imshow(arrey_rgb)
plt.show()
#Define numpy arrey for each color channel
matrix_red = np.zeros((256, 4),dtype = object)
matrix_red[:,1] = int(0)
matrix_red[:,2:] = float(0)
matrix_green = np.zeros((256, 4),dtype = object)
matrix_green[:,1] = int(0)
matrix_green[:,2:] = float(0)
matrix_blue = np.zeros((256, 4),dtype = object)
matrix_blue[:,1] = int(0)
matrix_blue[:,2:] = float(0)
# Completing first column with 0-255
for i in range(256):
matrix_red[i][0] = i
matrix_green[i][0] = i
matrix_blue[i][0] = i
# Counting intensity for each color channel
for i in range(width):
Matrix_Width = arrey_rgb[i]
for i in range(height):
Matrix_Height = Matrix_Width[i]
Red_Value = Matrix_Height[0]
Green_Value = Matrix_Height[1]
Blue_Value = Matrix_Height[2]
for i in range(256):
if (matrix_red[i][0] == Red_Value):
matrix_red[i][1] = matrix_red[i][1] + 1
if (matrix_green[i][0] == Green_Value):
matrix_green[i][1] = matrix_green[i][1] + 1
if (matrix_blue[i][0] == Blue_Value):
matrix_blue[i][1] = matrix_blue[i][1] + 1
# Data for task ahead
Hx = 0
for i in range(256):
matrix_red[i][2] = matrix_red[i][1] / size
Hx = (matrix_red[i][2] + matrix_red[i][3]) + Hx
matrix_red[i][3] = Hx
#Plotting results
Frequencie_Red = np.zeros((256, 1),dtype = object)
Frequencie_Red[:,0] = int(0)
Frequencie_Green = np.zeros((256, 1),dtype = object)
Frequencie_Green[:,0] = int(0)
Frequencie_Blue = np.zeros((256, 1),dtype = object)
Frequencie_Blue[:,0] = int(0)
Intensity = np.zeros((256, 1),dtype = object)
Intensity[:,0] = int(0)
for i in range(256):
Frequencie_Red[i] = matrix_red[i][1]
Frequencie_Green[i] = matrix_green[i][1]
Frequencie_Blue[i] = matrix_blue[i][1]
for i in range(256):
Intensity[i] = i
pos = Intensity
width = 1.0
ax = plt.axes()
ax.set_xticks(pos + (width / 2))
ax.set_xticklabels(Intensity)
plt.bar(pos, Frequencie_Red, width, color='r')
plt.show()
plt.bar(pos, Frequencie_Green, width, color='g')
plt.show()
plt.bar(pos, Frequencie_Blue, width, color='b')
plt.show()
As i got pointed out in comment section I added test image and capture with my results.
This is the image I'm using to do my calculations.Test Image
The result looks like this. Where Red histogram is red channel and so on:
*Small note, I have no idea why under the first histogram there are black line. Results
1 Answer 1
Usually, when using for loops and numpy
together, you’re probably doing it wrong.
Some things you could replace in your code:
- Use
np.arange(256)
instead ofnp.zeros
+for i in range(256)
; - Use slicing instead of manually extracting values out of an array;
- Use comparisons and
.sum()
instead of manually counting the number of pixels of a certain color intensity ((array_rgb == 42).sum(axis=1).sum(axis=0)
)
But all that is nothing compared to the use of numpy.histogram
which does exactly what you want.
You also open the same image twice without closing it and yet, you use a with
statement with your third open. Always use a with
statement to better manage your resources.
I would use several plt.figure()
and a single plt.show()
to get all 5 plots show at once and be able to compare them. It will also make the whole thing more responsive.
Lastly, I would put all the code in a function that can be parametrized with the image name:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
def histogram(image_name):
with Image.open(image_name) as image_file:
array_grey = np.asarray(image_file.convert('L'))
array_rgb = np.asarray(image_file.convert('RGB'))
plt.figure()
plt.imshow(array_grey, cmap='gray')
plt.figure()
plt.imshow(array_rgb)
for i, color in enumerate('rgb'):
hist, bins = np.histogram(array_rgb[..., i], bins=256, range=(0, 256))
plt.figure()
plt.bar(bins[:-1], hist, color=color)
plt.show()
if __name__ == '__main__':
histogram('picture1.jpg')
(Using the ...
syntax from this answer)
Or you could keep the 3 different histograms in variables for further processing:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
def build_histograms(image_array):
for i in range(image_array.shape[-1]):
hist, _ = np.histogram(image_array[..., i], bins=256, range=(0, 256))
yield hist
def histogram(image_name):
with Image.open(image_name) as image_file:
array_grey = np.asarray(image_file.convert('L'))
array_rgb = np.asarray(image_file.convert('RGB'))
plt.figure()
plt.imshow(array_grey, cmap='gray')
plt.figure()
plt.imshow(array_rgb)
red_frequencies, green_frequencies, blue_frequencies = build_histograms(array_rgb)
grey_frequencies, = build_histograms(array_grey[..., np.newaxis])
x_axis = np.arange(256)
plt.figure()
plt.bar(x_axis, red_frequencies, 1.0, color='r')
plt.figure()
plt.bar(x_axis, green_frequencies, 1.0, color='g')
plt.figure()
plt.bar(x_axis, blue_frequencies, 1.0, color='b')
plt.figure()
plt.bar(x_axis, grey_frequencies, 1.0, color='k')
plt.show()
if __name__ == '__main__':
histogram('picture1.jpg')
-
\$\begingroup\$ I have two questions. First why there are white stripes in my plot ? Second, in variable
hist
we save how many times x intensity shows in our array. How could i save these values before we switch to the next color. So in the end we would have 3 arrays one for each color channel. \$\endgroup\$RebelInc– RebelInc2017年11月20日 11:09:46 +00:00Commented Nov 20, 2017 at 11:09 -
\$\begingroup\$ @RebelInc the white stripes comes from the fact that bar width, by default, are not
1.0
. Change line 19 toplt.bar(bins[:-1], hist, 1.0, color=color)
to remove them. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2017年11月20日 11:16:11 +00:00Commented Nov 20, 2017 at 11:16
Explore related questions
See similar questions with these tags.
plt.show
anyway? \$\endgroup\$ax
xticks. They are so dense that all the numbers got merged together. Remove the 3ax
lines to get a more visualy appealing result like the blue or green images. \$\endgroup\$