Analyzing Product Photography Quality: Metrics Calculation -python

Question 1

I am working on analyzing product photography data on the website and would like to gather feedback on my approach. The goal is to calculate various image metrics to assess the quality of product photos. I have written the following code to calculate these metrics and classify the image quality:

import pandas as pd
import numpy as np
from PIL import Image
from skimage.exposure import is_low_contrast
from sklearn.cluster import KMeans
from skimage.measure import shannon_entropy
import cv2
# Load the input image from disk
image = Image.open("blue_dress.JPG")
# Calculate the contrast using the low contrast function
low_contrast_threshold = 0.35
is_low_contrast_value = is_low_contrast(np.array(image), low_contrast_threshold)
# Calculate the image metrics 
contrast = np.std(np.array(image))
brightness = np.mean(np.array(image))
sharpness = cv2.Laplacian(np.array(image), cv2.CV_64F).var()
entropy = shannon_entropy(np.array(image))
color_difference = np.max(np.array(image)) - np.min(np.array(image))
color_histogram = np.histogram(np.array(image), bins=256, range=(0, 255))
color_saturation = np.mean(color_histogram[0]) / 255
image_edge_detection = np.mean(cv2.Laplacian(np.array(image), cv2.CV_64F))
image_noise = np.var(np.array(image))
# Convert the image to numpy array
image_array = np.array(image)
# Reshape the image array
reshaped_image_array = image_array.reshape(-1, image_array.shape[-1])
# Perform K-means clustering
num_clusters = 2
kmeans = KMeans(n_clusters=num_clusters)
kmeans.fit(reshaped_image_array)
# Assign labels to each pixel in the image
labels = kmeans.labels_
# Calculate the centroid values for each cluster
cluster_centers = kmeans.cluster_centers_
# Calculate the foreground/background similarity
foreground_label = np.argmax(np.bincount(labels))
foreground_background_similarity = np.mean(np.abs(cluster_centers[foreground_label] - cluster_centers[1 - foreground_label]))
# Rescale the similarity value to the range of 0-100
foreground_background_similarity_rescaled = np.clip(foreground_background_similarity, 0, 100)
# Define thresholds for classification
contrast_thresholds = {'low': (0, 35), 'normal': (35, 60), 'high': (60, float('inf'))}
brightness_thresholds = {'low': (0, 100), 'normal': (100, 200), 'high': (200, 255)}
sharpness_thresholds = {'low': (0, 100), 'normal': (100, 200), 'high': (200, float('inf'))}
color_difference_thresholds = {'low': (0, 20), 'normal': (20, 40), 'high': (40, float('inf'))}
color_saturation_thresholds = {'low': (0.2, 0.5), 'normal': (0.5, 0.8), 'high': (0.8, 1)}
image_noise_thresholds = {'low': (20, 50), 'normal': (50, 80), 'high': (80, 100)}
def classify_value(value, thresholds):
 for label, (lower, upper) in thresholds.items():
 if lower <= value < upper:
 return label
 return 'Unknown'
# Create a dataframe with image metrics
data = {
 'Contrast': [contrast],
 'Brightness': [brightness],
 'Sharpness': [sharpness],
 'Entropy': [entropy],
 'Color Difference': [color_difference],
 'Color Saturation': [color_saturation],
 'Foreground_Background_Similarity': [foreground_background_similarity_rescaled],
 'Image Noise': [image_noise],
 'Image Edge Detection': [image_edge_detection],
}
df = pd.DataFrame(data)
# Apply classification
classifications = {
 'Contrast': (contrast_thresholds, 'Contrast_Classification'),
 'Brightness': (brightness_thresholds, 'Brightness_Classification'),
 'Sharpness': (sharpness_thresholds, 'Sharpness_Classification'),
 'Color Difference': (color_difference_thresholds, 'Color_Difference_Classification'),
 'Color Saturation': (color_saturation_thresholds, 'Color_Saturation_Classification'),
 'Image Noise': (image_noise_thresholds, 'Image_Noise_Classification'),
}
for column, (thresholds, classification_column) in classifications.items():
 df[classification_column] = df[column].apply(lambda x: classify_value(x, thresholds))
def determine_image_quality(row):
 if row['Contrast_Classification'] == 'Low' or \
 row['Brightness_Classification'] == 'Low' or \
 row['Sharpness_Classification'] == 'Low' or \
 row['Entropy'] <= 1 or \
 row['Foreground_Background_Similarity'] >= 90:
 return 'Poor'
 elif row['Contrast_Classification'] == 'High' and \
 row['Brightness_Classification'] == 'High' and \
 row['Sharpness_Classification'] == 'High' and \
 row['Entropy'] > 3 and \
 row['Foreground_Background_Similarity'] <= 30:
 return 'Excellent'
 else:
 return 'Good'
df['Image_Quality_Label'] = df.apply(determine_image_quality, axis=1)
df

In this code, I load an input image and calculate metrics such as contrast, brightness, sharpness, entropy, color difference, color saturation, foreground/background similarity, image noise, and image edge detection. Please have a look at the formulas.

I then create a dataframe to store these metrics and apply classifications based on defined thresholds.

Furthermore, I have added a function to determine the overall image quality based on the calculated metrics. The function assigns a label of "Poor," "Good," or "Excellent" depending on the thresholds and criteria defined.

I would appreciate any advice or suggestions on the following points:

Are the selected metrics appropriate for assessing product photography quality?
Are there any additional metrics or factors that I should consider?
Are the thresholds and classifications reasonable? Should I adjust them?
Is there a more efficient or optimized way to perform these calculations?
Any other ideas, tips, or improvements you can suggest?

please see the test image enter image description here

Thank you for your time and expertise. I look forward to your valuable input on this matter.

Question 2

I will be making some assumptions as to reference image

Question 3

I have added a test image here if needed @Reinderien Thank you

Question 4

Your dataframe has only one output row. I hope that the intent is to vectorize this to multiple images; otherwise there isn't a lot of value in using pandas.

Question 5

@Reinderien Yes, currently just one image but eventually it will be more and then will make a cluster to identify which group performs better in terms of visits and sales.

Question 6

Since you already have a review of the code, I’ll look at the image processing specifics.

sharpness = cv2.Laplacian(np.array(image), cv2.CV_64F).var()

The variance of the Laplacian is not necessary related to sharpness. A noisy image will have a larger value than a noise-free image, even if equally sharp. An image with a larger (flat) background will have a lower value, even if perfectly in focus.

There is no good way to estimate sharpness without knowing what was imaged. What you have is a proxy that correlates with sharpness for sufficiently similar images, but is not valid in general as a measure of sharpness.

color_difference = np.max(np.array(image)) - np.min(np.array(image))

I’m not sure how the name applies, you’re not looking at colors, you’re looking at the difference between the largest value and the smallest one, could be the large value of the green channel in a pixel, and the the small value of the red channel in the same pixel. For example an image that is completely green would have a large color difference according to this measure, even though all pixels have the same color.

If you want to compute the largest difference in colors, compute the Euclidean distance between each pair of pixels (n^2 comparisons if the image has n pixels), preferably in a color space such as Lab, then pick the largest result.

color_histogram = np.histogram(np.array(image), bins=256, range=(0, 255))

Again, this is a histogram of values where you combine all channels. I would expect you to compute three histograms (one for each channel), or a single 3D histogram (an actual color histogram).

color_saturation = np.mean(color_histogram[0]) / 255

The histogram contains counts of the pixels for each intensity. The mean of these counts is always the number of pixels divided by 256 (the number of bins). So this quantifies the image size.

To measure saturation, convert each pixel to a saturation value, for example by converting to HSV color space and taking the S channel, then compute the mean of these values.

image_edge_detection = np.mean(cv2.Laplacian(np.array(image), cv2.CV_64F))

The mean of the Laplacian I guess is close to 0 for an image with larger flat areas and transitions between them. Only thin lines (ridges) would increase or decrease the mean value, depending on their color — dark and bright lines would cancel out in this measure. I’m not sure what name you should give this, but it’s not related to edges.

Note that you are computing the Laplacian here again, you should re-use the earlier result.

image_noise = np.var(np.array(image))

You computed the standard deviation earlier, and called it contrast. The variance is the square of the standard deviation, how is that noise?

To estimate noise, first identify flat regions in the image, then compute their variance. For example the function dip.EstimateNoiseVariance() in DIPlib does this (disclosure: I’m an author of DIPlib).

You look for two clusters using k-means, then:

foreground_label = np.argmax(np.bincount(labels))
foreground_background_similarity = np.mean(np.abs(cluster_centers[foreground_label] - cluster_centers[1 - foreground_label]))

First of all, that second line is quite long. Try to break it up across lines, or do part of the computation in a separate statement.

But more importantly, you first assume that the larger cluster is the foreground, even though in the example you give the background is clearly larger. Then you go through great lengths to find the other cluster, subtract the two centroids, and take the absolute value of the result. You’d get the same result no matter which order you pick for these centroids. So you can thus simply do:

diff = cluster_centers[0] - cluster_centers[1]
foreground_background_similarity = np.mean(np.abs(diff))

Question 7

This:

image_array = np.array(image)

needs to happen right after you open(), so that you can use the array without repeating the cast later on (in your "image metrics"). That way you can use the instance methods of ndarray:

contrast = image_array.std()
brightness = image_array.mean()
color_difference = image_array.max() - image_array.min()
image_noise = image_array.var()

and so on.

classify_value should be vectorised. There should be no lambda; even if you did want to keep the apply then

df[column].apply(lambda x: classify_value(x, thresholds))

should just be

df[column].apply(classify_value, thresholds=thresholds)

but the apply shouldn't be there either. You should be able to broadcast a threshold comparison to all of your classification columns at once.

Don't operate on strings like 'Low' in your intermediate data. Assign quality integers like 0 through 2; that way you can meaningfully run sums, etc. Only convert to human-legible quality metrics at the very end of your program.

Are the selected metrics appropriate for assessing product photography quality?

That's kind of irrelevant, considering that you're using image statistics as a proxy for quality, and quality as a proxy for sales volume. I will suppress my opinions about quantifying fashion and art with machine learning algorithms. If you want to predict how sales-attractive a product is, you should be taking into consideration metadata about the product (the current season, the category of article of clothing, the price, and perhaps the dominant hue).

Cris Luengo Cris Luengo 6,9911 gold badge14 silver badges37 bronze badges · Accepted Answer · 2023-07-14 02:24:02Z

Since you already have a review of the code, I’ll look at the image processing specifics.

sharpness = cv2.Laplacian(np.array(image), cv2.CV_64F).var()

The variance of the Laplacian is not necessary related to sharpness. A noisy image will have a larger value than a noise-free image, even if equally sharp. An image with a larger (flat) background will have a lower value, even if perfectly in focus.

There is no good way to estimate sharpness without knowing what was imaged. What you have is a proxy that correlates with sharpness for sufficiently similar images, but is not valid in general as a measure of sharpness.

color_difference = np.max(np.array(image)) - np.min(np.array(image))

I’m not sure how the name applies, you’re not looking at colors, you’re looking at the difference between the largest value and the smallest one, could be the large value of the green channel in a pixel, and the the small value of the red channel in the same pixel. For example an image that is completely green would have a large color difference according to this measure, even though all pixels have the same color.

If you want to compute the largest difference in colors, compute the Euclidean distance between each pair of pixels (n^2 comparisons if the image has n pixels), preferably in a color space such as Lab, then pick the largest result.

color_histogram = np.histogram(np.array(image), bins=256, range=(0, 255))

Again, this is a histogram of values where you combine all channels. I would expect you to compute three histograms (one for each channel), or a single 3D histogram (an actual color histogram).

color_saturation = np.mean(color_histogram[0]) / 255

The histogram contains counts of the pixels for each intensity. The mean of these counts is always the number of pixels divided by 256 (the number of bins). So this quantifies the image size.

To measure saturation, convert each pixel to a saturation value, for example by converting to HSV color space and taking the S channel, then compute the mean of these values.

image_edge_detection = np.mean(cv2.Laplacian(np.array(image), cv2.CV_64F))

The mean of the Laplacian I guess is close to 0 for an image with larger flat areas and transitions between them. Only thin lines (ridges) would increase or decrease the mean value, depending on their color — dark and bright lines would cancel out in this measure. I’m not sure what name you should give this, but it’s not related to edges.

Note that you are computing the Laplacian here again, you should re-use the earlier result.

image_noise = np.var(np.array(image))

You computed the standard deviation earlier, and called it contrast. The variance is the square of the standard deviation, how is that noise?

To estimate noise, first identify flat regions in the image, then compute their variance. For example the function dip.EstimateNoiseVariance() in DIPlib does this (disclosure: I’m an author of DIPlib).

You look for two clusters using k-means, then:

foreground_label = np.argmax(np.bincount(labels))
foreground_background_similarity = np.mean(np.abs(cluster_centers[foreground_label] - cluster_centers[1 - foreground_label]))

First of all, that second line is quite long. Try to break it up across lines, or do part of the computation in a separate statement.

But more importantly, you first assume that the larger cluster is the foreground, even though in the example you give the background is clearly larger. Then you go through great lengths to find the other cluster, subtract the two centroids, and take the absolute value of the result. You’d get the same result no matter which order you pick for these centroids. So you can thus simply do:

diff = cluster_centers[0] - cluster_centers[1]
foreground_background_similarity = np.mean(np.abs(diff))

Stack Exchange Network

Analyzing Product Photography Quality: Metrics Calculation -python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Analyzing Product Photography Quality: Metrics Calculation -python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions