3
\$\begingroup\$

The first function will always try to correct the final result, while the second uses a conditional to know when to compensate. I think that in general the second function will be faster, but it will look a little worse than the first.

int floor_mod1(int a, int b)
{
 int mod = a % b;
 return (mod + b) % b;
}
int floor_mod2(int a, int b)
{
 int mod = a % b;
 if (mod != 0 && (a ^ b) < 0) {
 return mod + b;
 }
 return mod;
}

What does the condition do?
There's not need to correct already calculated 0 to 0, so we will not do anything in case when mod == 0. If a and b have the same sign, we will not anything, because the result is already correct.


More details:
And in fact, we can use (mod ^ b) < 0 instead of (a ^ b) < 0 to work with the old version of the language, where the modulo operator could round towards either zero or negative infinity.

Moreover, we can replace mod != 0 with (mod ^ b) != b (not with (a ^ b) != b), but this will probably slow down our code, it depends on the compiler.
BUT, if someone can figure out how to replace these parts ((mod ^ b) != b && (mod ^ b) < 0...) with fewer operations, then such implementation will probably be the best one. (keep in mind the most important facts: b will never be equal to 0, for the condition to work a and b must have different signs and a should not be equal to 0)

.. and we could probably write a better version in assembly, but it won't be as portable ..


Should I return value here return mod + b; or use mod += b?
Should I choose the second version because of its speed, regardless of the not-so-pretty code?
Am I correct that using (mod ^ b) < 0 instead of (a ^ b) < 0 doesn't matter for performance?
Any other comments?

asked Jul 24, 2021 at 0:23
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Personally I prefer the first version which - at least for me - is easier to read. If you one day notice that you have a performance issue in the function you can take a look at it again. Always try to write code that is easy to read, the future you and any colleagues will thank you. \$\endgroup\$ Commented Jul 24, 2021 at 5:14

2 Answers 2

3
\$\begingroup\$

Broken corner cases

"The first function will always try to correct the final result, " is unsupported, due to undefined behavior, when mod + b overflows as in floor_mod1(1, INT_MAX).

I think that in general the second function will be faster,

Speed should be a secondary concern, get the function to work correctly for all int a, int b.


Make code functionally correct before spending much effort on speed.

Sample functional test harness that exhibits functional problems

#include <stdio.h>
#include <limits.h>
int floor_mod1(int a, int b) {
 int mod = a % b;
 return (mod + b) % b;
}
int floor_mod2(int a, int b) {
 int mod = a % b;
 if (mod != 0 && (a ^ b) < 0) {
 return mod + b;
 }
 return mod;
}
// Wider math for reference
int floor_modR(int a, int b) {
 long long mod = (1LL * a) % b;
 return (int) ((mod + b) % b);
}
int main() {
 const int t[] = {0, 1, 2, 42, INT_MAX - 1, INT_MAX, -1, -2, INT_MIN + 1,
 INT_MIN};
 size_t n = sizeof t / sizeof t[0];
 for (size_t ia = 0; ia < n; ia++) {
 int a = t[ia];
 for (size_t ib = 0; ib < n; ib++) {
 int b = t[ib];
 if (b == 0) {
 continue;
 }
 // Better floor_mod() would handle this case too 
 // Both floor_mod1(), floor_mod2() fail this case.
 if (a == INT_MIN && b == -1) {
 continue;
 }
 int m1 = floor_mod1(a, b);
 int m2 = floor_mod2(a, b);
 int mR = floor_modR(a, b);
 if (m1 != mR || m2 != mR) {
 printf("f(%d,%d) --> %d %d %d\n", a, b, m1, m2, mR);
 fflush(stdout);
 }
 }
 }
}

Output

f(1,2147483647) --> -1 1 1
f(2,2147483646) --> -2 2 2
f(2,2147483647) --> 0 2 2
f(42,2147483646) --> -2147483608 42 42
f(42,2147483647) --> -2147483607 42 42
f(2147483646,2147483647) --> -3 2147483646 2147483646
f(-1,-2147483648) --> 2147483647 -1 -1
f(-2,-2147483647) --> 0 -2 -2
f(-2,-2147483648) --> 2147483646 -2 -2
f(-2147483647,-2147483648) --> 1 -2147483647 -2147483647
answered Jul 25, 2021 at 17:45
\$\endgroup\$
10
  • \$\begingroup\$ All floor_mods will return 0 in case of a == INT_MIN && b == -1, aren't they? Perhaps it is, as @Quuxplusone said, implementation-defined behavior of (a ^ b). Did you tried functionally identical version with (a < 0) != (b < 0) instead? \$\endgroup\$ Commented Jul 25, 2021 at 21:19
  • \$\begingroup\$ @0dminnimda a%b is only defined when a/b is defined. Since the quotient INT_MIN/-1 attempts a quotient more than INT_MAX, INT_MIN%-1 is undefined behavior (UB). I did not try (a < 0) != (b < 0). Easy enough for anyone to try. \$\endgroup\$ Commented Jul 25, 2021 at 22:19
  • \$\begingroup\$ well, it looks like INT_MIN % -1 itself is implementation-defined behavior, because with another compiler this results in a failure. \$\endgroup\$ Commented Jul 25, 2021 at 22:21
  • \$\begingroup\$ I did not try (a < 0) != (b < 0). Easy enough for anyone to try. this does not change anything. \$\endgroup\$ Commented Jul 25, 2021 at 22:22
  • 1
    \$\begingroup\$ @ReinstateMonica well, definitely I was checking some small values ​​and it seemed to me that the implementation was correct, just forgot about over/underflow. In any case, I cannot leave undefined behaviour here, it must always match the mathematical model, so this check will eventually end up in the implementation. \$\endgroup\$ Commented Jul 26, 2021 at 3:39
4
\$\begingroup\$

You can use Godbolt to play around with the C code and see if anything makes the assembly look particularly better. (Seems like Clang is strangely much better than GCC at making this code branch-free.)

Technically, (a ^ b) has implementation-defined behavior if you end up XORing non-zero sign bit(s) together. It's better (in terms of portability) to write (a < 0) != (b < 0). Let the compiler figure out the bit-twiddling.

Anyway, you're assuming that this function needs optimization... you've profiled your program and seen that it's the hot spot? So you're trying to turn two idiv instructions into one idiv instruction? Have you looked at your program's workload? Maybe you could even turn that one idiv into no idivs... on the happy path.

This exact situation is covered in Chandler Carruth's "Tuning C++" from CppCon 2015. He replaces o = i % ceil; with o = (i >= ceil) ? i % ceil : i; and the code magically gets faster. Consider whether your hot spot would benefit from similar workload analysis.

answered Jul 24, 2021 at 5:44
\$\endgroup\$
2
  • \$\begingroup\$ Thank you very much for the good answer, the other answer had a greater impact on the result, no offense to you, but I accept it. \$\endgroup\$ Commented Jul 31, 2021 at 13:58
  • \$\begingroup\$ Quuxplusone, OP's code use int. o = i % ceil; functions differently from o = (i >= ceil) ? i % ceil : i; when cell > 0, i < -cell. If full range i, cell is important, getting faster is not worth the coding error. \$\endgroup\$ Commented Jul 31, 2021 at 20:36

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.