One of the remarks in this answer to my question concerning IntModulus
says
What's more problematic is that you don't have matching argument symmetry for add and subtract. You should have long versions of those.
and the reason for this part missing is that I hoped to get something faster than the trivial
public int add(long x, long y) {
return mod(mod(x) + mod(y));
}
The optimized solution has only one modulus operation (sure, only benchmarking will tell if it was worth it).
Please have a short look at the original question before reviewing. I mean the comments, not the code.
public final class IntModulus {
///////// Irrelevant code removed, see the original question if you miss anything.
///////// New code below.
/** Return a non-negative value less than modulus and congruent to the exact sum. */
public int add(long x, long y) {
final long sum = x + y;
// The addition overflowed iff both operands have the same sign are the result's sign differs.
final boolean overflow = (x^y) >= 0 & (x^sum) < 0;
return overflow ? overflowedMod(sum) : mod(sum);
}
/** Return a non-negative value less than modulus and congruent to the exact sum. */
public int sub(long x, long y) {
final long diff = x - y;
// The subtraction overflowed iff the operands have opposing signs and the result's sign differs from the minuend.
final boolean overflow = (x^y) < 0 & (x^diff) < 0;
return overflow ? overflowedMod(diff) : mod(diff);
}
/**
* Compute a modulus of the argument under the assumption that it overflowed by exactly {@code 2**64},
* i.e., it's a result of an overflowing {@code long} addition or subtraction.
*
* <p>This means that the sign is surely wrong and that the number would fit into a 65-bit signed integer.
*/
private int overflowedMod(long x) {
// The proper value of the argument divided by two and floored (i.e., rounded towards negative infinity).
// The sign bit got restored by simply flipping it as we know it was wrong.
final long half = (x >> 1) ^ Long.MIN_VALUE;
final long lsb = x & 1; // the least significant bit
long result = 2L * mod(half) + lsb;
// Now the result lies in the range 0 to 2*modulus+1 (both included), so conditionally reduce it twice.
if (result>=modulus) result -= modulus;
if (result>=modulus) result -= modulus;
return (int) result;
}
///////// Old but relevant code below.
/** Return a non-negative value less than modulus and congruent to the operand. */
public int mod(long x) {
// As modulus is an int, the cast is safe.
return fixModulus((int) (x % modulus));
}
/**
* Fix the modulus, i.e., make it non-negative.
* The argument must be between {@code -modulus+1} and {@code modulus-1}.
*/
private int fixModulus(int x) {
return x<0 ? x+modulus : x;
}
@Getter private final int modulus;
}
The benchmark results shows that it's much faster than using BigInteger
(suffix "bi"). It's assuming that the operands are already available in the needed form (i.e., they don't have to be converted to/from BigInteger
).
benchmark
Subtraction is not shown, as it's obviously as fast as addition. The code for multiplication is given in the old question.
-
\$\begingroup\$ Do you have benchmarks of the current code vs the one that uses three mods? \$\endgroup\$JS1– JS12015年07月03日 05:30:30 +00:00Commented Jul 3, 2015 at 5:30
-
\$\begingroup\$ @JS1 Sadly, not anymore. There's some small improvement in case there's no overflow, otherwise the conditional branch makes it even slightly worse. I'd assume, the case of the operands never overflowing is the more important one. I guess, the observation concerning the conditional jump could lead us to a universally better solution. \$\endgroup\$maaartinus– maaartinus2015年07月03日 05:49:09 +00:00Commented Jul 3, 2015 at 5:49
1 Answer 1
Small improvements
Overall, it seems that what you wrote is correct and is an improvement over the mod(mod(x)+mod(y))
solution, especially in the case where there is no overflow. I have three small improvements:
Add instead of multiply by 2
You can replace this line:
long result = 2L * mod(half) + lsb;
with:
long halfmod = mod(half);
long result = halfmod + halfmod + lsb;
The modified version showed a 1% improvement in the overflow case. (See benchmarks below)
Remove one unnecessary check
Following the code above, you have this code:
// Now the result lies in the range 0 to 2*modulus+1 (both included), // so conditionally reduce it twice. if (result>=modulus) result -= modulus; if (result>=modulus) result -= modulus;
But actually, the value of mod(half)
is in the range 0..modulus-1
. So if you multiply that by 2 and add 1, result
will be in the range 0..2*modulus-1
, not 0..2*modulus+1
as you stated in the comment. Therefore, after the first result -= modulus
, the range of result will be 0..modulus-1
and there is no need for the second check.
// Now the result lies in the range 0 to 2*modulus-1 (both included),
// so conditionally reduce it once.
if (result>=modulus) result -= modulus;
By removing the second check, I found a 1% speed improvement in the overflow case.
Checking for overflow
I replaced this overflow check for add:
final boolean overflow = (x^y) >= 0 & (x^sum) < 0;
with this very similar one:
final boolean overflow = ((x^y^Long.MIN_VALUE) & (x^sum)) < 0;
Edit: Later I found that this inverted version is faster:
final boolean overflow = ((x^y) | ~(x^sum)) >= 0;
It tests the same thing but uses one extra xor instead of a comparison. For the overflow case I found a 2% speed improvement, and it was about 4% faster in the non-overflow case.
For subtraction I did a similar thing:
final boolean overflow = (x^y) < 0 & (x^diff) < 0;
became:
final boolean overflow = ((x^y) & (x^diff)) < 0;
Here I didn't even have to add an extra xor.
Benchmarks
I took the suggestion of using JMH to run benchmarks. Here are the results:
Overflow add test
-----------------
Methodology: Add 0x4000000000000000L with itself modulo 0x12345678
in a loop of 100000 iterations.
Method Operations per second Speed
------ --------------------- -----
Original code 238.754 +- 0.778 100.0%
Multiply change 241.527 +- 0.884 101.1%
Remove extra check 241.180 +- 1.007 101.0%
Change overflow check 243.335 +- 1.662 101.9%
All changes 256.452 +- 1.325 107.4%
Non-overflow add test
---------------------
Methodology: Add 0x4000000000000000L with 1 modulo 0x12345678
in a loop of 100000 iterations.
Method Operations per second Speed
------ --------------------- -----
Original code 330.298 +- 1.635 100.0%
Change overflow check 342.496 +- 2.037 103.7%
-
\$\begingroup\$ Well done! Concerning benchmarks, could you give JMH (or [Caliper(github.com/google/caliper)]) a try? I know, maven is an abomination (at least for people like me), but otherwise, you can never really trust it (and even with a proper tool, you can never be sure). +++ I'm not accepting your answer yet so the question can get more attention, but we know how it ends. \$\endgroup\$maaartinus– maaartinus2015年07月03日 08:18:19 +00:00Commented Jul 3, 2015 at 8:18