Next is an algorithm to calculate the histogram for a given YUV420sp image. The YUV data is given by the hardware camera as an unsigned char*
.
The algorithms works correctly, but I'm looking to optimize it for speed.
void calculateHistogram(const unsigned char* yuv420sp, const int yuvWidth, const int yuvHeight,
const int canvasHeight, int* histogramBins)
{
const int BINS = 256;
// Clear the output
memset(histogramBins, 0, BINS * sizeof(int));
// Get the bin values
const unsigned int totalPixels = yuvWidth * yuvHeight;
for (int index = 0; index < totalPixels; index++)
{
histogramBins[yuv420sp[index]]++;
}
// Resolve the maximum count in bins
int maxBinValue = 0;
for (int index = 0; index < BINS; index++)
{
if (histogramBins[index] > maxBinValue)
{
maxBinValue = histogramBins[index];
}
}
maxBinValue = maxBinValue == 0 ? 1 : maxBinValue;
// Normalize to fit into the histogram UI canvas height
for (int index = 0; index < BINS; index++)
{
histogramBins[index] = histogramBins[index] * canvasHeight / maxBinValue;
}
}
2 Answers 2
Style
Everything looks quite good to me.
The only thing that I find a bit disturbing is the empty line after comments : it split relevant things apart and it makes the code harder to read.
You could add a bit of documentation explaining what your code does.
Making things more simple
After the for
-loop to compute the max value, you perform maxBinValue = maxBinValue == 0 ? 1 : maxBinValue;
. As far as I can understand, the point is not to have a division by 0 if max is 0 which corresponds to a situation where histogramBins
consist only of 0s which should happen only of totalPixels <= 0
which is quite unlikely.
In any case :
- I find the usage of a ternary operator a bit inappropriate here as it might as well be a simple :
if (maxBinValue == 0) maxBinValue = 1;
. Alternatively, if you were to define functions (or macros) formin
andmax
, this could be :maxBinValue = max(1, maxBinValue)
. - You could get rid of this is a very very simple way : if you were to assign
1
tomaxBinValue
in the first place (before the loop), you wouldn't have to worry about it being equal to 0.
Please note that if you were to define a max
function/macro, it could be useful inside the loop too as it would become : maxBinValue = max(histogramBins[index],maxBinValue)
.
Getting rid of magic numbers
This 256
in const int BINS = 256;
definitly sounds like a value I've seen in other places. Indeed, it is not just a random setting I can update if I ever want to : it corresponds to the maximum value yuv420sp[index]
can take. Its type is unsigned char
and the maximum value can be retrieved from limits.h : doing const int BINS = UCHAR_MAX
would probably make things easier to understand.
There are a number of things that you could do to make your program run faster. On my machine, all of them together shaved off a little over 8% of the time of your original. Here's what I did:
Use unsigned
where applicable
Your histogram counts can never be less than zero, so it makes sense to use unsigned
instead of int
for their type. This often allows a small but measurable performance increase when doing calculations or comparisons, depending on the particulars of your machine.
Prefer preincrement over postincrement
On my machine it made no difference with this code, but it's often the case that preincrement (++i
) is slightly faster than postincrement (i++
) because the postincrement case has to save the original value and then increment while the preincrement can simply save in place. In some cases (especially if the architecture is register-starved), this results in a performance difference.
Precalculate loop invariants
The value canvasHeight / maxBinValue
doesn't need to be recalculated every loop iteration because the value of it doesn't change within the loop. This kind of calculation is therefore known as a loop invariant and can be precalculated outside the loop for a small but measurable performance increase.
Eliminate unnecessary operations
Your code sets the initial maxBinValue = 0
and then changes it to 1 if the value is 0 after the loop. Why not just start with the value of 1? That way no post-loop adjustment is required.
Use pointers rather than indexing
Pointer arithmetic is often faster than indexing, so rather than having a loop construct for (int index = 0; index < BINS; index++)
it is often faster instead to have something like for (unsigned *bin = histogramBins; bin != lastbin; ++bin)
Measure and verify
All of this is only of theoretical interest until you actually measure the performance on your machine. Your computer, operating system, and compiler are all likely different from mine, so what improves performance on my machine could actually slow things down on yours. The only way to know for sure is to measure.
I did so using the time
command on my Linux machine and found that your original code took an average of 850ms for the program below, while the final optimized version took an average of 779ms. This is the test harness I used:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* test code goes here */
int main()
{
const unsigned height=480;
const unsigned width=640;
unsigned char in[height*width];
unsigned hist[BINS];
for (unsigned i=0; i < height*width; ++i)
in[i] = rand();
for (int i=0; i < 1000; ++i)
calculateHistogram(in, width, height, 1024, hist);
}
This is the optimized version of your code. Note that BINS
is moved outside the function so that it may also be used by the calling code.
const int BINS = 256;
void calculateHistogram(const unsigned char* yuv420sp, const int yuvWidth,
const int yuvHeight, const int canvasHeight, unsigned* histogramBins)
{
// Clear the output
memset(histogramBins, 0, BINS * sizeof(unsigned));
// Get the bin values
const unsigned char *last = &yuv420sp[yuvWidth*yuvHeight];
for ( ; yuv420sp != last; ++yuv420sp)
++histogramBins[*yuv420sp];
// Resolve the maximum count in bins
unsigned maxBinValue = 1;
const unsigned *lastbin = &histogramBins[BINS];
for (unsigned *bin = histogramBins; bin != lastbin; ++bin)
if (*bin > maxBinValue)
maxBinValue = *bin;
// Normalize to fit into the histogram UI canvas height
unsigned scale = canvasHeight / maxBinValue;
for (unsigned *bin = histogramBins; bin != lastbin; ++bin)
*bin *= scale;
}
-
1\$\begingroup\$ Ineresting comments even though I didn't expect any of them to make a difference with any decent compiler. In any case, I am not quite sure that you can pre-compute
canvasHeight / maxBinValue
out of the loop (or at least not with integers operations). Indeed, let's assume for the sake of simplicity thatcanvasHeight < maxBinValue
, thenscale = 0
and I guess you can imagine what will happen when performing the multiplication. The division cannot be factorised out of the loop because what gets divised is notcanvasHeight
butx * canvasHeight
. \$\endgroup\$SylvainD– SylvainD2014年04月18日 16:03:46 +00:00Commented Apr 18, 2014 at 16:03 -
\$\begingroup\$ You are right Josay, the division cannot be pre-computed outside the loop, otherwise would loose the required precision. \$\endgroup\$Perraco– Perraco2014年04月18日 16:21:07 +00:00Commented Apr 18, 2014 at 16:21