I'm currently working on a tool that needs to decode, modify data, then re-encode that data using the Yaz0 format. Everything has been in a working state, but after running some code analysis on my program it turns out it is spending an obscene amount of time on the encoding portion (Over 95%). However, I cannot think of a way to make the loop in this function any more efficient than it already is.
So far I have compacted the inside for loop which used to have a break and two ifs inside it. I also tried using OpenMP, but since I am running async operations on all cores it just hindered performance. I haven't touched anything outside the loop because it is negligible as far as the analysis shows.
Here is a screenshot of the code analysis as well as the function in question. The code is compiled using Microsoft Visual Studio 2013 in debug mode and ran on Windows 8.1.
Code analysis screenshot
uint32_t simpleEnc(uint8_t *src, uint32_t size, uint32_t pos, uint32_t *pMatchPos)
{
int32_t startPos = pos - 0x1000;
uint32_t numBytes = 1;
uint32_t matchPos = 0;
int32_t diff = size - pos;
if (startPos < 0)
{
startPos = 0;
}
for (int i = startPos; i < pos; i++)
{
for (int j = 0; j < diff && src[i + j] == src[j + pos]; j++)
{
if (j > numBytes)
{
numBytes = j;
matchPos = i;
}
}
}
*pMatchPos = matchPos;
if (numBytes == 2)
{
numBytes = 1;
}
return numBytes;
}
Are there any performance optimizations I've missed? I know there is some barrier where this will just take a fair amount of time, but since this is the biggest bottleneck in the program I want it to be as fast as possible.
-
\$\begingroup\$ Please see this meta post on how to have a successful iterative code review \$\endgroup\$rolfl– rolfl2014年07月24日 12:00:52 +00:00Commented Jul 24, 2014 at 12:00
-
\$\begingroup\$ @rolfl After reading several posts there I think I'll change this to have previous revisions in a list of links as it progresses as per this answer: meta.codereview.stackexchange.com/a/534 \$\endgroup\$ozdrgnaDiies– ozdrgnaDiies2014年07月24日 13:53:10 +00:00Commented Jul 24, 2014 at 13:53
-
\$\begingroup\$ @rolfl: Why did you not link to the current FAQ? \$\endgroup\$Jamal– Jamal2014年07月24日 15:59:56 +00:00Commented Jul 24, 2014 at 15:59
-
\$\begingroup\$ @Jamal Because I .... failed. This is the right link: For an iterative review, is it okay to edit my own question to include revised code? \$\endgroup\$rolfl– rolfl2014年07月24日 16:03:50 +00:00Commented Jul 24, 2014 at 16:03
-
\$\begingroup\$ @ozdrgnaDiies: We apologize for the confusion. You should follow rolfl's second link, which leads to the site's official FAQ on this policy (it's also found in the Help Center). The first post linked was an older one. \$\endgroup\$Jamal– Jamal2014年07月24日 16:28:57 +00:00Commented Jul 24, 2014 at 16:28
4 Answers 4
As seems to be my custom when answering performance questions, let me start with something else than performance (because the other things came to my eye earlier)
Vertical Whitespace
You are using much vertical whitespace without necessity. You could shave off at least 3 lines with that (and another 5 with another brace placement style but that is a too religious topic for me to touch).
Less (vertical) whitespace means less scrolling/distance to travel for the eyes while reading so you should use it only to help guide the reader.
Variable declaration
The variables are declared C-style at the top of the function. While this gives a good overview over all variables it is not good C++ style where you declare variables as late as possible.
The variable matchPos
might be removed because it does not add over directly assigning to pMatchPos
(besides the indirection through a pointer, which might give a slowdown).
Missing comments
There are no comments on lines that look very suspicious to me like:
int32_t startPos = pos - 0x1000;
//...
if (startPos < 0) {
startPos = 0;
}
Why is this subtracted? What is the meaning of the number (keyword magic number)? Why must it not be negative?
if (numBytes == 2) {
numBytes = 1;
}
Why is the 2 a special case? (as an aside, the same thing could be more readable with return numBytes == 2 ? 1 : numBytes;
)
Naming
Usually I would suggest better names for variables but so far neither comments nor existing names have guided me enough to know what algorithm you try to implement here (which is bad).
So after some minutes of staring at the code my guess is this:
simpleEnc
tries to find the longest "substring" that occurs twice in src
and starts at pos.
If I am right then the name simpleEnc
is misleading because no encryption is done. Instead it should be named something like longestDuplicateStartingAt
or something like this.
From this I would take that i
should be named offset
because that is where the other substring should start.
j
would then be the characterIndex
.
Standard library is your friend
Reading your code I see no single "real" function being called. While that might be okay for such "low level" code it could be an indicator that you are reimplementing standard functionality and if you look closer, you will notice that your inner loop does two things: search for the first index where two strings differ and building a maximum/argmax.
The first can be solved using std::mismatch
and the latter has some side effect that we cannot achieve with standard algorithms (that I know of).
Another thing that makes me wonder is: Why are you working with raw pointers and sizes and such things? src
could be a std::string
or std::vector
which would save you from passing the size.
Loop bounds
Your outer loop actually might have a shorter bound:
It seems that you don't want to go farther than the to the start reference sequence that you want to test for (i < pos
) so I guess you don't want to overlap with it. This means you should rephrase this to i + numBytes < pos
because then the overlap is already happening.
Invalid output
You start with numBytes = 1
but it could be possible that your input does not allow that. Your sequence must always start with the value at src[pos]
so if all previous values are unequal to this the numBytes
are actually zero.
The output zero would be a much more natural fit to tell the caller that no match has been found than 1.
-
\$\begingroup\$ Thanks for the input. Something I should mention is that problems with numBytes being invalid or having a special case are just parts of the way Yaz0 works. If it is 2 then you act like it is by itself. Also, this code was originally taken from a standalone program made in C and I haven't made too much of an effort to make it more C++-ish or fix any naming; at least in this function. It's hard to explain how the function that calls this interacts with it, but things not conflicting with that I will take into consideration (like mismatch). \$\endgroup\$ozdrgnaDiies– ozdrgnaDiies2014年07月24日 19:13:00 +00:00Commented Jul 24, 2014 at 19:13
Output arguments
You currently use a pointer for your output argument. While this is a matter of preference, I found it confusing considering the other pointer argument indicates an array. I would use a reference for pMatchPos
instead of a pointer.
const
-correctness and immutability
In a pointer-math-heavy function like this, it is especially important to document which values can and cannot be changed within the function body. Use const
to specify that a value is not mutable. This allows the compiler to do more error-checking, and makes the code easier to reason about.
Most of your function parameters aren't modified inside the function, so they can be marked const
. Note especially src
, which has two const
modifiers, indicating its value cannot be changed, nor can the data it points to.
Also, startPos
is only mutated when it is clamped to zero. You can rewrite this using std::max()
(from the algorithm
header) to put the initialization on one line, and then make it const
as well. Note that using max()
requires pos
be a signed type, otherwise there will be overflow errors.
An improved beginning of your function would look like this:
uint32_t simpleEnc(uint8_t const * const src, uint32_t const size,
int32_t const pos, uint32_t& pMatchPos) // note type changes, cf. above
{
const int32_t startPos = std::max(0, pos - 0x1000);
const int32_t diff = size - pos;
Magic numbers
I don't know what 0x1000
is supposed to represent, and neither will anyone else who doesn't intimately understand the algorithm. It would be better to assign it to a named constant. Something like
constexpr uint32_t BACKTRACK_AMOUNT = 0x1000; // Pick a more accurate name, though.
Naming
pMatchPos
uses Hungarian notation. Please don't do this. Instead, try to come up with a more descriptive name, or just leave off the prefix.
-
\$\begingroup\$ Thanks for the input. One thing I didn't mention is that this project requires /DNOMINMAX for other portions due to conflicts which prevents me from using std::max. I'm not sure if I've removed the conflict though, so I'll double check if it's possible. As for the rest, I'll get to working on the const-correctness and probably use vectors as per @Nobody's answer. \$\endgroup\$ozdrgnaDiies– ozdrgnaDiies2014年07月24日 19:18:59 +00:00Commented Jul 24, 2014 at 19:18
-
\$\begingroup\$ @ozdrgnaDiies I'm pretty sure that
/DNOMINMAX
disables Microsoft'smin
andmax
macros, not the standard library functions. You should still be able to usestd::max()
. \$\endgroup\$Aurelius– Aurelius2014年07月24日 19:23:21 +00:00Commented Jul 24, 2014 at 19:23 -
\$\begingroup\$ Sorry, I haven't slept in a long time and I must have forgotten what min/max was disabled. It works now after casting pos - 0x1000 to an int. \$\endgroup\$ozdrgnaDiies– ozdrgnaDiies2014年07月24日 19:26:33 +00:00Commented Jul 24, 2014 at 19:26
Correct me if I am wrong, this code tries to solve a Longest Common Substring problem, and does it in a (naive) O(n*m) way. Consider switching to a better algorithm, such as suffix tree.
Just looking at the logic of the code, it looks to me that you could do the same thing by assigning j to numBytes after the j loop exits and moving the matchpos assignment outside the j loop since it is assigned the value of i and doesn't need to be assigned on every iteration of the j loop:
for (int i = startPos; i < pos; i++)
{
int j = 0;
for (; j < diff && src[i + j] == src[j + pos]; j++)
{
}
if (j > numBytes)
{
numBytes = j;
matchPos = i;
}
}
Just some thoughts that could help.
-
\$\begingroup\$ It seems this has shaved off a bit of time! I changed the for loop to a while and re-added the if condition for setting numBytes and matchPos and CPU time went from 1328.742s to 1240.980s. I'll change my code in the post and see if someone else can come up with more optimizations. \$\endgroup\$ozdrgnaDiies– ozdrgnaDiies2014年07月24日 07:46:52 +00:00Commented Jul 24, 2014 at 7:46
-
\$\begingroup\$ The problem with this proposed change is that it yields different results. The final value of
matchPos
will not be the same as in the original. \$\endgroup\$Edward– Edward2014年07月24日 14:36:55 +00:00Commented Jul 24, 2014 at 14:36 -
\$\begingroup\$ using the conditional should work. \$\endgroup\$user33306– user333062014年07月24日 15:53:59 +00:00Commented Jul 24, 2014 at 15:53
-
\$\begingroup\$ Not quite. Try both the original and your version with this:
const uint32_t size = 0x8000; uint8_t src[size]; uint32_t pos = 0x2000; uint32_t MatchPos = 0; uint32_t MatchPos2 = 0; uint32_t res, res2; std::fill(src, &src[size], 'a'); for (uint8_t *ptr=&src[123]; ptr < &src[size]; ptr += 500) *ptr = 'b';
\$\endgroup\$Edward– Edward2014年07月24日 15:55:57 +00:00Commented Jul 24, 2014 at 15:55 -
\$\begingroup\$ Actually that pretty much follows exactly the OP's modified code, which the OP says works. \$\endgroup\$user33306– user333062014年07月24日 16:02:22 +00:00Commented Jul 24, 2014 at 16:02