Intro
I have a simple methods that converts the input instances of java.math.BigInteger
to checksum. The checksum algorithm sums up all the digits in the input BigInteger
. Then, if the checksum \$c\$ consists of only one digit, we are done and return that sole digit as our checksum. Otherwise, we call the checksum algorithm with \$c\$ as its parameter.
Code
io.github.coderodde.util.hash.SimpleBigIntegerChecksum.java:
package io.github.coderodde.util.hash;
import java.math.BigInteger;
/**
* This class provides the facilities for computing simple
* {@link java.math.BigInteger} hash that is a value between 0 and 9, both
* inclusively. The idea is to sum all the digits of an input integer. If it's
* withing range 0 and 9, we are done. Otherwise, we compute the hash
* recursively.
*
* @version 1.0.0 (Jul 1, 2025)
* @since 1.0.0 (Jul 1, 2025)
*/
public final class SimpleBigIntegerChecksum {
private SimpleBigIntegerChecksum() {}
/**
* Computes the checksum for the input {@link java.math.BigInteger}.
*
* @param bi the target big integer.
*
* @return the checksum of {@code bi}.
*/
public static int hash(final BigInteger bi) {
if (bi.compareTo(BigInteger.ZERO) < 0) {
// Omit the leading minus sign:
return hash(bi.negate());
}
final String integerText = bi.toString();
if (integerText.length() == 1) {
// Once the hash is only one character long, we are done:
return Integer.parseInt(integerText);
}
BigInteger nextInteger = BigInteger.ZERO;
// Sum up all the digits in the current big integer:
for (final char ch : integerText.toCharArray()) {
nextInteger = nextInteger.add(BigInteger.valueOf((long)(ch - '0')));
}
// Recur further:
return hash(nextInteger);
}
public static void main(String[] args) {
// Eighteen nines = 90 + 8 * 9 =わ 90 +たす 72 =わ 162, checksum must be 9:
System.out.println(hash(BigInteger.valueOf(999999999999999999L)));
}
}
Critique request
So what do you like? Please, tell me anything that comes to mind.
-
3\$\begingroup\$ This is almost equivalent to taking the number mod 9, except that that produces 0 instead of 9. That's not a criticism but it's .. something. \$\endgroup\$user555045– user5550452025年07月01日 17:27:28 +00:00Commented Jul 1 at 17:27
3 Answers 3
The first very pedantic criticism: I prefer a parameter name n
rather than bi
. The more math like flavor.
The sign -1 / 0 / 1 can be gathered without cumbersom compare
.
if (bi.signum() == -1) {
// Omit the leading minus sign:
return hash(bi.negate());
}
Using BigInteger nextInteger
for the temporary hash value is correct, as it could overflow here. However the recursion could have been cut a bit better (later!).
Using text (String), digit extraction is something to question. Fine, but what about alternatives.
Let us do imaginary alternatives, using int
for better notation (than BigInteger).
public static int hash(int n) {
if (n < 10) {
return n;
}
int h1 = hash(n / 10);
int h2 = n % 10;
int h = h1 + h2; // At most 9+9 = 18
return hash(h); // Or h < 10 ? h : (h - 10 + 1);
}
As you can see here, using the recursive call on n / 10 you do not need BigInteger for nextInteger
.
Last but not least, the hash function could be written mathematical as follows. (Proof of induction it is called I vaguely remember.)
public static int hash(int n) {
int h = n % 9;
return h == 0 && n != 0 ? 9 : h;
}
The complexity of BigInteger division and modulo 10 versus String representation with easier operations is irrelevant, as String represention will need the same division resp. modulo.
In general everything is fine; for a student's grading review I would only mention the signum
function - as useful.
-
1\$\begingroup\$ For large BigIntegers, it's probably faster to use the hopefully-optimized library function to convert to a decimal string than to manually div/mod by
10
yourself. An optimized library function can do far fewer bigint ops by dividing by10^9
, or10^18
, and chopping up the resulting 32 or 64-bit integer using fast single-word native CPU operations. (As a bonus, it can then easily use a multiplicative inverse instead of actual HW division. And the next bigint operation has no data dependency on it, so out-of-order exec can be crunching on that and the next bigint division.) \$\endgroup\$Peter Cordes– Peter Cordes2025年07月02日 14:51:32 +00:00Commented Jul 2 at 14:51 -
\$\begingroup\$ You can of course write your own optimized code for getting base-10 digits which uses u64 integers (or i64 since this is Java :/), but that's more complexity. But explicitly writing naive code makes it very unlikely for a compiler to do something clever. \$\endgroup\$Peter Cordes– Peter Cordes2025年07月02日 14:56:52 +00:00Commented Jul 2 at 14:56
-
1\$\begingroup\$ Also related for handling the base-10^9 or base-10^18 chunks: lemire.me/blog/2022/03/28/… / dougallj.wordpress.com/2022/04/01/… / pvk.ca/Blog/2017/12/22/… tia.mat.br/posts/2014/06/23/integer_to_string_conversion.html (dividing by 100 to break into pairs of digits, although mostly for latency rather than throughput, and this use-case already has ILP) \$\endgroup\$Peter Cordes– Peter Cordes2025年07月02日 14:56:58 +00:00Commented Jul 2 at 14:56
-
\$\begingroup\$ Anyway, Dan Lemire's AVX-512IFMA code is the kind of thing we could hope
bigint.toString()
uses on a highly-optimized JVM (for chopping up u64 chunks after divmod of the bigint by10^16
or something). It's implausible that you could AVX-512 code yourself with portable Java. It's pretty optimistic to hope that JVMs are currently doing this, but it's possible, and future improvements are always possible. Might be interesting to benchmark a.toString()
implementation against naivesum += n % 10;
/n /= 10;
to see if.toString
is any better than naive. \$\endgroup\$Peter Cordes– Peter Cordes2025年07月02日 15:05:26 +00:00Commented Jul 2 at 15:05
This isn't a good checksum if your goal is to detect errors. There are a number of common errors that this simply won't catch. For example, if you write 123 as 132, this method wouldn't catch it. Both have the same checksum of 6.
You might check out the Luhn, Verhoeff, and Damm algorithms if you are limited to a single digit. They are capable of catching more errors than the digital root (the formal name of your method).
As already noted, this algorithm can be represented non-recursively by modulo nine. I suspect that this would be more efficient, and it is certainly shorter, even with the special cases (checking for zero and negative numbers).
Does it have to be recursive?
If I see this right, this could be wrapped in a loop just as easy. Theoretically a BigInteger
is only limited by the amount of memory you have, so theoretically this method could be made to overflow the stack.
BigInteger must support values in the range -2^Integer.MAX_VALUE (exclusive) to +2^Integer.MAX_VALUE (exclusive) and may support values outside of that range. An ArithmeticException is thrown when a BigInteger constructor or method would generate a value outside of the supported range. The range of probable prime values is limited and may be less than the full supported positive range of BigInteger. The range must be at least 1 to 2^500000000.
That said, depending on the size of the number, toString
might consume a lot of memory. The alternative would be to calculate the digits (through divideAndRemainder
), but that will create additional Object
instances during calculation. So, it depends on your use-case, I guess. Right now the implementation (when handed a "malformed" value) could be used to run a DoS by using up a lot of memory or aborting the operation with a stack overflow. Using divideAndRemainder
and a loop instead of recursion would mean more GC pressure but would remedy these issues.
public final class SimpleBigIntegerChecksum {
private SimpleBigIntegerChecksum() {}
Very nice. I see rarely helper/utility classes which are "correctly" defined, so this is nice.
public static int hash(final BigInteger bi) {
I'm not a fan of the name bi
, bigInteger
or value
might be better fits.
if (bi.compareTo(BigInteger.ZERO) < 0) {
// Omit the leading minus sign:
return hash(bi.negate());
}
This will create a new BigInteger
instance and recurse one level. Maybe skipping the minus in the String
representation would be a cheaper option?
final String integerText = bi.toString();
I'm not that happy with that name, maybe valueAsString
or bigIntegerString
would be better ones?
if (integerText.length() == 1) {
// Once the hash is only one character long, we are done:
return Integer.parseInt(integerText);
}
That check could be moved above by testing whether bi
is less than 10. That would also mean you don't need to do BigInteger -> String -> int
conversion but BigInteger -> int
.
// Sum up all the digits in the current big integer:
for (final char ch : integerText.toCharArray()) {
nextInteger = nextInteger.add(BigInteger.valueOf((long)(ch - '0')));
}
You could also use substring
, for example:
for (int index = 0; index < integerText.length(); index++) {
long digit = Long.parseLong(integerText.substring(index, index + 1)));
nextInteger = nextInteger.add(BigInteger.valueOf(digit));
}
That has the upside that it does not rely on the ASCII/UTF-8-Page (and knowledge of it), but has the downside of being more verbose. One could use the BigInteger
constructor to create the BigInteger
directly from the String
, skipping the parse to long
, but that would also skip the BigInteger
cache which is used by valueOf
.
Note that the (correct) code-point aware version would look like this:
for (int index = 0; index < integerText.length(); index += Character.charCount(integerText.codePointAt(index))) {
Long digit = Long.parseLong(integerText.substring(index, index + Character.charCount(integerText.codePointAt(index)))));
nextInteger = nextInteger.add(BigInteger.valueOf(digit));
}
That said, splitting the loop into two lines would add greatly readability:
for (final char ch : integerText.toCharArray()) {
long digit = (long)(ch - '0');
nextInteger = nextInteger.add(BigInteger.valueOf(digit));
}
Also, ch
isn't that great of a name either.
// Eighteen nines = 90 + 8 * 9 =わ 90 +たす 72 =わ 162, checksum must be 9:
System.out.println(hash(BigInteger.valueOf(999999999999999999L)));
I know this is only an example, but this comment/code combination should either be a proper test, or the comment should also be printed to stdout.
-
3\$\begingroup\$ While I don't prefer the recursion, it cannot be deep. I mean consider the numbers that could have 999999999999999999 (the example number here) as their sum-of-digits, they'd be petabytes in size already and that only gives us 1 extra level of recursion \$\endgroup\$user555045– user5550452025年07月02日 11:06:12 +00:00Commented Jul 2 at 11:06
-
\$\begingroup\$ That's true, didn't think that much about it. \$\endgroup\$Bobby– Bobby2025年07月02日 11:43:23 +00:00Commented Jul 2 at 11:43
-
\$\begingroup\$ If you want a recursive version which can go deep, use the
hash(n) = hash(hash(n/10) + n%10)
formula mentioned by Joop, which has one recursion level per decimal digit. \$\endgroup\$Paŭlo Ebermann– Paŭlo Ebermann2025年07月03日 21:40:19 +00:00Commented Jul 3 at 21:40 -
\$\begingroup\$ Why do we need a "codepoint-aware" version for parsing a string which is known to contain just decimal digits? \$\endgroup\$Paŭlo Ebermann– Paŭlo Ebermann2025年07月03日 21:45:51 +00:00Commented Jul 3 at 21:45
Explore related questions
See similar questions with these tags.