Return to Answer

edited body

edited Jul 11, 2024 at 2:09

There is one major issue with the code, and then some performance issues (the other answer already touched on some of these).

Major Issue

In the histogram computation, you loop over pixels in parallel, but update a single set of arrays.

Parallel threads updating the same array invariably leads to race conditions. Here, one thread reads a value, increments it by one, then writes it back. Say another thread does the same thing to the same value, at the same time. It reads the value before the first thread writes, and it writes it out after the first thread writes. The second thread thus clobbered (overwrote) the first one’s work.

And because writing is typically done in a larger unit than a single int, the chance of this type of race condition happening is larger than you’d think.

Either create the histogram in a single thread, or have each thread build its own histogram from its own section of the image, then combine the histograms at the end.

I would keep it simple, have this function be single-threaded, and run it in parallel, each thread processing a different frame.

Performance Issues

You have two loops, "stretching" and "scaling". The "stretching" loop actually just clips the data, the second loop stretches (== scales). But the second loop also clips, since you use cv::saturate_cast<uchar>(). So, that first loop is completely superfluous, leave it out!

As the other answer pointed out, a single loop through data is much faster than two loops through data. Try to do as much of your processing as possible in each loop, so you don’t loop more often than necessary. But in this particular case, the work done in the first loop is not needed at all.

There is one major issue with the code, and then some performance issues (the other answer already touched on some of these).

Major Issue

In the histogram computation, you loop over pixels in parallel, but update a single set of arrays.

And because writing is typically done in a larger unit than a single int, the chance of this type of race condition happening is larger than you’d think.

Either create the histogram in a single thread, or have each thread build its own histogram from its own section of the image, then combine the histograms at the end.

I would keep it simple, have this function be single-threaded, and run it in parallel, each thread processing a different frame.

Performance Issues

There is one major issue with the code, and then some performance issues (the other answer already touched on some of these).

Major Issue

In the histogram computation, you loop over pixels in parallel, but update a single set of arrays.

And because writing is typically done in a larger unit than a single int, the chance of this type of race condition happening is larger than you’d think.

Either create the histogram in a single thread, or have each thread build its own histogram from its own section of the image, then combine the histograms at the end.

I would keep it simple, have this function be single-threaded, and run it in parallel, each thread processing a different frame.

Performance Issues

Source Link

answered Jul 10, 2024 at 23:56

Cris Luengo

answered Jul 10, 2024 at 23:56

Cris Luengo

There is one major issue with the code, and then some performance issues (the other answer already touched on some of these).

Major Issue

In the histogram computation, you loop over pixels in parallel, but update a single set of arrays.

And because writing is typically done in a larger unit than a single int, the chance of this type of race condition happening is larger than you’d think.

Either create the histogram in a single thread, or have each thread build its own histogram from its own section of the image, then combine the histograms at the end.

I would keep it simple, have this function be single-threaded, and run it in parallel, each thread processing a different frame.

Performance Issues

lang-cpp