floor modulo or modulo rounding towards negative infinity

Question 1

The first function will always try to correct the final result, while the second uses a conditional to know when to compensate. I think that in general the second function will be faster, but it will look a little worse than the first.

int floor_mod1(int a, int b)
{
 int mod = a % b;
 return (mod + b) % b;
}
int floor_mod2(int a, int b)
{
 int mod = a % b;
 if (mod != 0 && (a ^ b) < 0) {
 return mod + b;
 }
 return mod;
}

What does the condition do?
There's not need to correct already calculated 0 to 0, so we will not do anything in case when mod == 0. If a and b have the same sign, we will not anything, because the result is already correct.

More details:
And in fact, we can use (mod ^ b) < 0 instead of (a ^ b) < 0 to work with the old version of the language, where the modulo operator could round towards either zero or negative infinity.

Moreover, we can replace mod != 0 with (mod ^ b) != b (not with (a ^ b) != b), but this will probably slow down our code, it depends on the compiler.
BUT, if someone can figure out how to replace these parts ((mod ^ b) != b && (mod ^ b) < 0...) with fewer operations, then such implementation will probably be the best one. (keep in mind the most important facts: b will never be equal to 0, for the condition to work a and b must have different signs and a should not be equal to 0)

.. and we could probably write a better version in assembly, but it won't be as portable ..

Should I return value here return mod + b; or use mod += b?
Should I choose the second version because of its speed, regardless of the not-so-pretty code?
Am I correct that using (mod ^ b) < 0 instead of (a ^ b) < 0 doesn't matter for performance?
Any other comments?

Question 2

Personally I prefer the first version which - at least for me - is easier to read. If you one day notice that you have a performance issue in the function you can take a look at it again. Always try to write code that is easy to read, the future you and any colleagues will thank you.

Question 3

Broken corner cases

"The first function will always try to correct the final result, " is unsupported, due to undefined behavior, when mod + b overflows as in floor_mod1(1, INT_MAX).

I think that in general the second function will be faster,

Speed should be a secondary concern, get the function to work correctly for all int a, int b.

Make code functionally correct before spending much effort on speed.

Sample functional test harness that exhibits functional problems

#include <stdio.h>
#include <limits.h>
int floor_mod1(int a, int b) {
 int mod = a % b;
 return (mod + b) % b;
}
int floor_mod2(int a, int b) {
 int mod = a % b;
 if (mod != 0 && (a ^ b) < 0) {
 return mod + b;
 }
 return mod;
}
// Wider math for reference
int floor_modR(int a, int b) {
 long long mod = (1LL * a) % b;
 return (int) ((mod + b) % b);
}
int main() {
 const int t[] = {0, 1, 2, 42, INT_MAX - 1, INT_MAX, -1, -2, INT_MIN + 1,
 INT_MIN};
 size_t n = sizeof t / sizeof t[0];
 for (size_t ia = 0; ia < n; ia++) {
 int a = t[ia];
 for (size_t ib = 0; ib < n; ib++) {
 int b = t[ib];
 if (b == 0) {
 continue;
 }
 // Better floor_mod() would handle this case too 
 // Both floor_mod1(), floor_mod2() fail this case.
 if (a == INT_MIN && b == -1) {
 continue;
 }
 int m1 = floor_mod1(a, b);
 int m2 = floor_mod2(a, b);
 int mR = floor_modR(a, b);
 if (m1 != mR || m2 != mR) {
 printf("f(%d,%d) --> %d %d %d\n", a, b, m1, m2, mR);
 fflush(stdout);
 }
 }
 }
}

Output

f(1,2147483647) --> -1 1 1
f(2,2147483646) --> -2 2 2
f(2,2147483647) --> 0 2 2
f(42,2147483646) --> -2147483608 42 42
f(42,2147483647) --> -2147483607 42 42
f(2147483646,2147483647) --> -3 2147483646 2147483646
f(-1,-2147483648) --> 2147483647 -1 -1
f(-2,-2147483647) --> 0 -2 -2
f(-2,-2147483648) --> 2147483646 -2 -2
f(-2147483647,-2147483648) --> 1 -2147483647 -2147483647

Question 4

All floor_mods will return 0 in case of a == INT_MIN && b == -1, aren't they? Perhaps it is, as @Quuxplusone said, implementation-defined behavior of (a ^ b). Did you tried functionally identical version with (a < 0) != (b < 0) instead?

Question 5

@0dminnimda a%b is only defined when a/b is defined. Since the quotient INT_MIN/-1 attempts a quotient more than INT_MAX, INT_MIN%-1 is undefined behavior (UB). I did not try (a < 0) != (b < 0). Easy enough for anyone to try.

Question 6

well, it looks like INT_MIN % -1 itself is implementation-defined behavior, because with another compiler this results in a failure.

Question 7

I did not try (a < 0) != (b < 0). Easy enough for anyone to try. this does not change anything.

Question 8

@ReinstateMonica well, definitely I was checking some small values and it seemed to me that the implementation was correct, just forgot about over/underflow. In any case, I cannot leave undefined behaviour here, it must always match the mathematical model, so this check will eventually end up in the implementation.

Question 9

You can use Godbolt to play around with the C code and see if anything makes the assembly look particularly better. (Seems like Clang is strangely much better than GCC at making this code branch-free.)

Technically, (a ^ b) has implementation-defined behavior if you end up XORing non-zero sign bit(s) together. It's better (in terms of portability) to write (a < 0) != (b < 0). Let the compiler figure out the bit-twiddling.

Anyway, you're assuming that this function needs optimization... you've profiled your program and seen that it's the hot spot? So you're trying to turn two idiv instructions into one idiv instruction? Have you looked at your program's workload? Maybe you could even turn that one idiv into no idivs... on the happy path.

This exact situation is covered in Chandler Carruth's "Tuning C++" from CppCon 2015. He replaces o = i % ceil; with o = (i >= ceil) ? i % ceil : i; and the code magically gets faster. Consider whether your hot spot would benefit from similar workload analysis.

Question 10

Thank you very much for the good answer, the other answer had a greater impact on the result, no offense to you, but I accept it.

Question 11

Quuxplusone, OP's code use int. o = i % ceil; functions differently from o = (i >= ceil) ? i % ceil : i; when cell > 0, i < -cell. If full range i, cell is important, getting faster is not worth the coding error.

chux chux 36.2k2 gold badges43 silver badges96 bronze badges · Accepted Answer · 2021-07-25 17:45:55Z

Broken corner cases

"The first function will always try to correct the final result, " is unsupported, due to undefined behavior, when mod + b overflows as in floor_mod1(1, INT_MAX).

I think that in general the second function will be faster,

Speed should be a secondary concern, get the function to work correctly for all int a, int b.

Make code functionally correct before spending much effort on speed.

Sample functional test harness that exhibits functional problems

#include <stdio.h>
#include <limits.h>
int floor_mod1(int a, int b) {
 int mod = a % b;
 return (mod + b) % b;
}
int floor_mod2(int a, int b) {
 int mod = a % b;
 if (mod != 0 && (a ^ b) < 0) {
 return mod + b;
 }
 return mod;
}
// Wider math for reference
int floor_modR(int a, int b) {
 long long mod = (1LL * a) % b;
 return (int) ((mod + b) % b);
}
int main() {
 const int t[] = {0, 1, 2, 42, INT_MAX - 1, INT_MAX, -1, -2, INT_MIN + 1,
 INT_MIN};
 size_t n = sizeof t / sizeof t[0];
 for (size_t ia = 0; ia < n; ia++) {
 int a = t[ia];
 for (size_t ib = 0; ib < n; ib++) {
 int b = t[ib];
 if (b == 0) {
 continue;
 }
 // Better floor_mod() would handle this case too 
 // Both floor_mod1(), floor_mod2() fail this case.
 if (a == INT_MIN && b == -1) {
 continue;
 }
 int m1 = floor_mod1(a, b);
 int m2 = floor_mod2(a, b);
 int mR = floor_modR(a, b);
 if (m1 != mR || m2 != mR) {
 printf("f(%d,%d) --> %d %d %d\n", a, b, m1, m2, mR);
 fflush(stdout);
 }
 }
 }
}

Output

f(1,2147483647) --> -1 1 1
f(2,2147483646) --> -2 2 2
f(2,2147483647) --> 0 2 2
f(42,2147483646) --> -2147483608 42 42
f(42,2147483647) --> -2147483607 42 42
f(2147483646,2147483647) --> -3 2147483646 2147483646
f(-1,-2147483648) --> 2147483647 -1 -1
f(-2,-2147483647) --> 0 -2 -2
f(-2,-2147483648) --> 2147483646 -2 -2
f(-2147483647,-2147483648) --> 1 -2147483647 -2147483647

All floor_mods will return 0 in case of a == INT_MIN && b == -1, aren't they? Perhaps it is, as @Quuxplusone said, implementation-defined behavior of (a ^ b). Did you tried functionally identical version with (a < 0) != (b < 0) instead?
@0dminnimda a%b is only defined when a/b is defined. Since the quotient INT_MIN/-1 attempts a quotient more than INT_MAX, INT_MIN%-1 is undefined behavior (UB). I did not try (a < 0) != (b < 0). Easy enough for anyone to try.
well, it looks like INT_MIN % -1 itself is implementation-defined behavior, because with another compiler this results in a failure.
I did not try (a < 0) != (b < 0). Easy enough for anyone to try. this does not change anything.
@ReinstateMonica well, definitely I was checking some small values and it seemed to me that the implementation was correct, just forgot about over/underflow. In any case, I cannot leave undefined behaviour here, it must always match the mathematical model, so this check will eventually end up in the implementation.

Stack Exchange Network

floor modulo or modulo rounding towards negative infinity

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

floor modulo or modulo rounding towards negative infinity

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions