Optimizing nested loop in run length encoding function

Question 1

I'm currently working on a tool that needs to decode, modify data, then re-encode that data using the Yaz0 format. Everything has been in a working state, but after running some code analysis on my program it turns out it is spending an obscene amount of time on the encoding portion (Over 95%). However, I cannot think of a way to make the loop in this function any more efficient than it already is.

So far I have compacted the inside for loop which used to have a break and two ifs inside it. I also tried using OpenMP, but since I am running async operations on all cores it just hindered performance. I haven't touched anything outside the loop because it is negligible as far as the analysis shows.

Here is a screenshot of the code analysis as well as the function in question. The code is compiled using Microsoft Visual Studio 2013 in debug mode and ran on Windows 8.1.

Code analysis screenshot

uint32_t simpleEnc(uint8_t *src, uint32_t size, uint32_t pos, uint32_t *pMatchPos)
{
 int32_t startPos = pos - 0x1000;
 uint32_t numBytes = 1;
 uint32_t matchPos = 0;
 int32_t diff = size - pos;
 if (startPos < 0)
 {
 startPos = 0;
 }
 for (int i = startPos; i < pos; i++)
 {
 for (int j = 0; j < diff && src[i + j] == src[j + pos]; j++)
 {
 if (j > numBytes)
 {
 numBytes = j;
 matchPos = i;
 }
 }
 }
 *pMatchPos = matchPos;
 if (numBytes == 2)
 {
 numBytes = 1;
 }
 return numBytes;
}

Are there any performance optimizations I've missed? I know there is some barrier where this will just take a fair amount of time, but since this is the biggest bottleneck in the program I want it to be as fast as possible.

Question 2

Please see this meta post on how to have a successful iterative code review

Question 3

@rolfl After reading several posts there I think I'll change this to have previous revisions in a list of links as it progresses as per this answer: meta.codereview.stackexchange.com/a/534

Question 4

@rolfl: Why did you not link to the current FAQ?

Question 5

@Jamal Because I .... failed. This is the right link: For an iterative review, is it okay to edit my own question to include revised code?

Question 6

@ozdrgnaDiies: We apologize for the confusion. You should follow rolfl's second link, which leads to the site's official FAQ on this policy (it's also found in the Help Center). The first post linked was an older one.

Question 7

As seems to be my custom when answering performance questions, let me start with something else than performance (because the other things came to my eye earlier)

Vertical Whitespace

You are using much vertical whitespace without necessity. You could shave off at least 3 lines with that (and another 5 with another brace placement style but that is a too religious topic for me to touch).

Less (vertical) whitespace means less scrolling/distance to travel for the eyes while reading so you should use it only to help guide the reader.

Variable declaration

The variables are declared C-style at the top of the function. While this gives a good overview over all variables it is not good C++ style where you declare variables as late as possible.

The variable matchPos might be removed because it does not add over directly assigning to pMatchPos (besides the indirection through a pointer, which might give a slowdown).

Missing comments

There are no comments on lines that look very suspicious to me like:

int32_t startPos = pos - 0x1000;
//...
if (startPos < 0) {
 startPos = 0;
}

Why is this subtracted? What is the meaning of the number (keyword magic number)? Why must it not be negative?

if (numBytes == 2) {
 numBytes = 1;
}

Why is the 2 a special case? (as an aside, the same thing could be more readable with return numBytes == 2 ? 1 : numBytes;)

Naming

Usually I would suggest better names for variables but so far neither comments nor existing names have guided me enough to know what algorithm you try to implement here (which is bad).

So after some minutes of staring at the code my guess is this:

simpleEnc tries to find the longest "substring" that occurs twice in src and starts at pos.

If I am right then the name simpleEnc is misleading because no encryption is done. Instead it should be named something like longestDuplicateStartingAt or something like this.

From this I would take that i should be named offset because that is where the other substring should start.

j would then be the characterIndex.

Standard library is your friend

Reading your code I see no single "real" function being called. While that might be okay for such "low level" code it could be an indicator that you are reimplementing standard functionality and if you look closer, you will notice that your inner loop does two things: search for the first index where two strings differ and building a maximum/argmax.

The first can be solved using std::mismatch and the latter has some side effect that we cannot achieve with standard algorithms (that I know of).

Another thing that makes me wonder is: Why are you working with raw pointers and sizes and such things? src could be a std::string or std::vector which would save you from passing the size.

Loop bounds

Your outer loop actually might have a shorter bound: It seems that you don't want to go farther than the to the start reference sequence that you want to test for (i < pos) so I guess you don't want to overlap with it. This means you should rephrase this to i + numBytes < pos because then the overlap is already happening.

Invalid output

You start with numBytes = 1 but it could be possible that your input does not allow that. Your sequence must always start with the value at src[pos] so if all previous values are unequal to this the numBytes are actually zero.

The output zero would be a much more natural fit to tell the caller that no match has been found than 1.

Question 8

Thanks for the input. Something I should mention is that problems with numBytes being invalid or having a special case are just parts of the way Yaz0 works. If it is 2 then you act like it is by itself. Also, this code was originally taken from a standalone program made in C and I haven't made too much of an effort to make it more C++-ish or fix any naming; at least in this function. It's hard to explain how the function that calls this interacts with it, but things not conflicting with that I will take into consideration (like mismatch).

Question 9

Output arguments

You currently use a pointer for your output argument. While this is a matter of preference, I found it confusing considering the other pointer argument indicates an array. I would use a reference for pMatchPos instead of a pointer.

`const`-correctness and immutability

In a pointer-math-heavy function like this, it is especially important to document which values can and cannot be changed within the function body. Use const to specify that a value is not mutable. This allows the compiler to do more error-checking, and makes the code easier to reason about.

Most of your function parameters aren't modified inside the function, so they can be marked const. Note especially src, which has two const modifiers, indicating its value cannot be changed, nor can the data it points to.

Also, startPos is only mutated when it is clamped to zero. You can rewrite this using std::max() (from the algorithm header) to put the initialization on one line, and then make it const as well. Note that using max() requires pos be a signed type, otherwise there will be overflow errors.

An improved beginning of your function would look like this:

uint32_t simpleEnc(uint8_t const * const src, uint32_t const size,
 int32_t const pos, uint32_t& pMatchPos) // note type changes, cf. above
{
 const int32_t startPos = std::max(0, pos - 0x1000);
 const int32_t diff = size - pos;

Magic numbers

I don't know what 0x1000 is supposed to represent, and neither will anyone else who doesn't intimately understand the algorithm. It would be better to assign it to a named constant. Something like

constexpr uint32_t BACKTRACK_AMOUNT = 0x1000; // Pick a more accurate name, though.

Naming

pMatchPos uses Hungarian notation. Please don't do this. Instead, try to come up with a more descriptive name, or just leave off the prefix.

Question 10

Thanks for the input. One thing I didn't mention is that this project requires /DNOMINMAX for other portions due to conflicts which prevents me from using std::max. I'm not sure if I've removed the conflict though, so I'll double check if it's possible. As for the rest, I'll get to working on the const-correctness and probably use vectors as per @Nobody's answer.

Question 11

@ozdrgnaDiies I'm pretty sure that /DNOMINMAX disables Microsoft's min and max macros, not the standard library functions. You should still be able to use std::max().

Question 12

Sorry, I haven't slept in a long time and I must have forgotten what min/max was disabled. It works now after casting pos - 0x1000 to an int.

Question 13

Correct me if I am wrong, this code tries to solve a Longest Common Substring problem, and does it in a (naive) O(n*m) way. Consider switching to a better algorithm, such as suffix tree.

Question 14

Just looking at the logic of the code, it looks to me that you could do the same thing by assigning j to numBytes after the j loop exits and moving the matchpos assignment outside the j loop since it is assigned the value of i and doesn't need to be assigned on every iteration of the j loop:

for (int i = startPos; i < pos; i++)
{
 int j = 0;
 for (; j < diff && src[i + j] == src[j + pos]; j++)
 {
 }
 if (j > numBytes)
 {
 numBytes = j;
 matchPos = i;
 }
}

Just some thoughts that could help.

Question 15

It seems this has shaved off a bit of time! I changed the for loop to a while and re-added the if condition for setting numBytes and matchPos and CPU time went from 1328.742s to 1240.980s. I'll change my code in the post and see if someone else can come up with more optimizations.

Question 16

The problem with this proposed change is that it yields different results. The final value of matchPos will not be the same as in the original.

Question 17

using the conditional should work.

Question 18

Not quite. Try both the original and your version with this:

const uint32_t size = 0x8000; uint8_t src[size]; uint32_t pos = 0x2000; uint32_t MatchPos = 0; uint32_t MatchPos2 = 0; uint32_t res, res2; std::fill(src, &src[size], 'a'); for (uint8_t *ptr=&src[123]; ptr < &src[size]; ptr += 500) *ptr = 'b';

Question 19

Actually that pretty much follows exactly the OP's modified code, which the OP says works.

score 4 · Accepted Answer · 2014-07-24 17:56:07Z

As seems to be my custom when answering performance questions, let me start with something else than performance (because the other things came to my eye earlier)

Vertical Whitespace

You are using much vertical whitespace without necessity. You could shave off at least 3 lines with that (and another 5 with another brace placement style but that is a too religious topic for me to touch).

Less (vertical) whitespace means less scrolling/distance to travel for the eyes while reading so you should use it only to help guide the reader.

Variable declaration

The variables are declared C-style at the top of the function. While this gives a good overview over all variables it is not good C++ style where you declare variables as late as possible.

The variable matchPos might be removed because it does not add over directly assigning to pMatchPos (besides the indirection through a pointer, which might give a slowdown).

Missing comments

There are no comments on lines that look very suspicious to me like:

int32_t startPos = pos - 0x1000;
//...
if (startPos < 0) {
 startPos = 0;
}

Why is this subtracted? What is the meaning of the number (keyword magic number)? Why must it not be negative?

if (numBytes == 2) {
 numBytes = 1;
}

Why is the 2 a special case? (as an aside, the same thing could be more readable with return numBytes == 2 ? 1 : numBytes;)

Naming

Usually I would suggest better names for variables but so far neither comments nor existing names have guided me enough to know what algorithm you try to implement here (which is bad).

So after some minutes of staring at the code my guess is this:

simpleEnc tries to find the longest "substring" that occurs twice in src and starts at pos.

If I am right then the name simpleEnc is misleading because no encryption is done. Instead it should be named something like longestDuplicateStartingAt or something like this.

From this I would take that i should be named offset because that is where the other substring should start.

j would then be the characterIndex.

Standard library is your friend

Reading your code I see no single "real" function being called. While that might be okay for such "low level" code it could be an indicator that you are reimplementing standard functionality and if you look closer, you will notice that your inner loop does two things: search for the first index where two strings differ and building a maximum/argmax.

The first can be solved using std::mismatch and the latter has some side effect that we cannot achieve with standard algorithms (that I know of).

Another thing that makes me wonder is: Why are you working with raw pointers and sizes and such things? src could be a std::string or std::vector which would save you from passing the size.

Loop bounds

Your outer loop actually might have a shorter bound: It seems that you don't want to go farther than the to the start reference sequence that you want to test for (i < pos) so I guess you don't want to overlap with it. This means you should rephrase this to i + numBytes < pos because then the overlap is already happening.

Invalid output

You start with numBytes = 1 but it could be possible that your input does not allow that. Your sequence must always start with the value at src[pos] so if all previous values are unequal to this the numBytes are actually zero.

The output zero would be a much more natural fit to tell the caller that no match has been found than 1.

Thanks for the input. Something I should mention is that problems with numBytes being invalid or having a special case are just parts of the way Yaz0 works. If it is 2 then you act like it is by itself. Also, this code was originally taken from a standalone program made in C and I haven't made too much of an effort to make it more C++-ish or fix any naming; at least in this function. It's hard to explain how the function that calls this interacts with it, but things not conflicting with that I will take into consideration (like mismatch).

Stack Exchange Network

Optimizing nested loop in run length encoding function

4 Answers 4

Vertical Whitespace

Variable declaration

Missing comments

Naming

Standard library is your friend

Loop bounds

Invalid output

Output arguments

`const`-correctness and immutability

Magic numbers

Naming

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimizing nested loop in run length encoding function

4 Answers 4

Vertical Whitespace

Variable declaration

Missing comments

Naming

Standard library is your friend

Loop bounds

Invalid output

Output arguments

const-correctness and immutability

Magic numbers

Naming

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`const`-correctness and immutability