5
\$\begingroup\$

Intro

I have a simple methods that converts the input instances of java.math.BigInteger to checksum. The checksum algorithm sums up all the digits in the input BigInteger. Then, if the checksum \$c\$ consists of only one digit, we are done and return that sole digit as our checksum. Otherwise, we call the checksum algorithm with \$c\$ as its parameter.


Code

io.github.coderodde.util.hash.SimpleBigIntegerChecksum.java:

package io.github.coderodde.util.hash;
import java.math.BigInteger;
/**
 * This class provides the facilities for computing simple 
 * {@link java.math.BigInteger} hash that is a value between 0 and 9, both
 * inclusively. The idea is to sum all the digits of an input integer. If it's
 * withing range 0 and 9, we are done. Otherwise, we compute the hash 
 * recursively.
 * 
 * @version 1.0.0 (Jul 1, 2025)
 * @since 1.0.0 (Jul 1, 2025)
 */
public final class SimpleBigIntegerChecksum {
 
 private SimpleBigIntegerChecksum() {}
 
 /**
 * Computes the checksum for the input {@link java.math.BigInteger}.
 * 
 * @param bi the target big integer.
 * 
 * @return the checksum of {@code bi}.
 */
 public static int hash(final BigInteger bi) {
 if (bi.compareTo(BigInteger.ZERO) < 0) {
 // Omit the leading minus sign:
 return hash(bi.negate());
 }
 
 final String integerText = bi.toString();
 
 if (integerText.length() == 1) {
 // Once the hash is only one character long, we are done:
 return Integer.parseInt(integerText);
 }
 
 BigInteger nextInteger = BigInteger.ZERO;
 
 // Sum up all the digits in the current big integer:
 for (final char ch : integerText.toCharArray()) {
 nextInteger = nextInteger.add(BigInteger.valueOf((long)(ch - '0')));
 }
 
 // Recur further:
 return hash(nextInteger);
 }
 
 public static void main(String[] args) {
 // Eighteen nines = 90 + 8 * 9 = 90 +たす 72 = 162, checksum must be 9:
 System.out.println(hash(BigInteger.valueOf(999999999999999999L)));
 }
}

Critique request

So what do you like? Please, tell me anything that comes to mind.

asked Jul 1 at 16:58
\$\endgroup\$
1
  • 3
    \$\begingroup\$ This is almost equivalent to taking the number mod 9, except that that produces 0 instead of 9. That's not a criticism but it's .. something. \$\endgroup\$ Commented Jul 1 at 17:27

3 Answers 3

7
\$\begingroup\$

The first very pedantic criticism: I prefer a parameter name n rather than bi. The more math like flavor.

The sign -1 / 0 / 1 can be gathered without cumbersom compare.

 if (bi.signum() == -1) {
 // Omit the leading minus sign:
 return hash(bi.negate());
 }

Using BigInteger nextInteger for the temporary hash value is correct, as it could overflow here. However the recursion could have been cut a bit better (later!).

Using text (String), digit extraction is something to question. Fine, but what about alternatives.

Let us do imaginary alternatives, using int for better notation (than BigInteger).

public static int hash(int n) {
 if (n < 10) {
 return n;
 }
 int h1 = hash(n / 10);
 int h2 = n % 10;
 int h = h1 + h2; // At most 9+9 = 18
 return hash(h); // Or h < 10 ? h : (h - 10 + 1);
}

As you can see here, using the recursive call on n / 10 you do not need BigInteger for nextInteger.

Last but not least, the hash function could be written mathematical as follows. (Proof of induction it is called I vaguely remember.)

public static int hash(int n) {
 int h = n % 9;
 return h == 0 && n != 0 ? 9 : h;
}

The complexity of BigInteger division and modulo 10 versus String representation with easier operations is irrelevant, as String represention will need the same division resp. modulo.

In general everything is fine; for a student's grading review I would only mention the signum function - as useful.

answered Jul 2 at 9:58
\$\endgroup\$
4
  • 1
    \$\begingroup\$ For large BigIntegers, it's probably faster to use the hopefully-optimized library function to convert to a decimal string than to manually div/mod by 10 yourself. An optimized library function can do far fewer bigint ops by dividing by 10^9, or 10^18, and chopping up the resulting 32 or 64-bit integer using fast single-word native CPU operations. (As a bonus, it can then easily use a multiplicative inverse instead of actual HW division. And the next bigint operation has no data dependency on it, so out-of-order exec can be crunching on that and the next bigint division.) \$\endgroup\$ Commented Jul 2 at 14:51
  • \$\begingroup\$ You can of course write your own optimized code for getting base-10 digits which uses u64 integers (or i64 since this is Java :/), but that's more complexity. But explicitly writing naive code makes it very unlikely for a compiler to do something clever. \$\endgroup\$ Commented Jul 2 at 14:56
  • 1
    \$\begingroup\$ Also related for handling the base-10^9 or base-10^18 chunks: lemire.me/blog/2022/03/28/… / dougallj.wordpress.com/2022/04/01/… / pvk.ca/Blog/2017/12/22/… tia.mat.br/posts/2014/06/23/integer_to_string_conversion.html (dividing by 100 to break into pairs of digits, although mostly for latency rather than throughput, and this use-case already has ILP) \$\endgroup\$ Commented Jul 2 at 14:56
  • \$\begingroup\$ Anyway, Dan Lemire's AVX-512IFMA code is the kind of thing we could hope bigint.toString() uses on a highly-optimized JVM (for chopping up u64 chunks after divmod of the bigint by 10^16 or something). It's implausible that you could AVX-512 code yourself with portable Java. It's pretty optimistic to hope that JVMs are currently doing this, but it's possible, and future improvements are always possible. Might be interesting to benchmark a .toString() implementation against naive sum += n % 10; / n /= 10; to see if .toString is any better than naive. \$\endgroup\$ Commented Jul 2 at 15:05
7
\$\begingroup\$

This isn't a good checksum if your goal is to detect errors. There are a number of common errors that this simply won't catch. For example, if you write 123 as 132, this method wouldn't catch it. Both have the same checksum of 6.

You might check out the Luhn, Verhoeff, and Damm algorithms if you are limited to a single digit. They are capable of catching more errors than the digital root (the formal name of your method).

As already noted, this algorithm can be represented non-recursively by modulo nine. I suspect that this would be more efficient, and it is certainly shorter, even with the special cases (checking for zero and negative numbers).

answered Jul 3 at 5:59
\$\endgroup\$
7
\$\begingroup\$

Does it have to be recursive?

If I see this right, this could be wrapped in a loop just as easy. Theoretically a BigInteger is only limited by the amount of memory you have, so theoretically this method could be made to overflow the stack.

BigInteger must support values in the range -2^Integer.MAX_VALUE (exclusive) to +2^Integer.MAX_VALUE (exclusive) and may support values outside of that range. An ArithmeticException is thrown when a BigInteger constructor or method would generate a value outside of the supported range. The range of probable prime values is limited and may be less than the full supported positive range of BigInteger. The range must be at least 1 to 2^500000000.

From the BigInteger Javadoc.

That said, depending on the size of the number, toString might consume a lot of memory. The alternative would be to calculate the digits (through divideAndRemainder), but that will create additional Object instances during calculation. So, it depends on your use-case, I guess. Right now the implementation (when handed a "malformed" value) could be used to run a DoS by using up a lot of memory or aborting the operation with a stack overflow. Using divideAndRemainder and a loop instead of recursion would mean more GC pressure but would remedy these issues.


public final class SimpleBigIntegerChecksum {
 
 private SimpleBigIntegerChecksum() {}

Very nice. I see rarely helper/utility classes which are "correctly" defined, so this is nice.


 public static int hash(final BigInteger bi) {

I'm not a fan of the name bi, bigInteger or value might be better fits.


 if (bi.compareTo(BigInteger.ZERO) < 0) {
 // Omit the leading minus sign:
 return hash(bi.negate());
 }

This will create a new BigInteger instance and recurse one level. Maybe skipping the minus in the String representation would be a cheaper option?


 final String integerText = bi.toString();

I'm not that happy with that name, maybe valueAsString or bigIntegerString would be better ones?


 if (integerText.length() == 1) {
 // Once the hash is only one character long, we are done:
 return Integer.parseInt(integerText);
 }

That check could be moved above by testing whether bi is less than 10. That would also mean you don't need to do BigInteger -> String -> int conversion but BigInteger -> int.


 // Sum up all the digits in the current big integer:
 for (final char ch : integerText.toCharArray()) {
 nextInteger = nextInteger.add(BigInteger.valueOf((long)(ch - '0')));
 }

You could also use substring, for example:

for (int index = 0; index < integerText.length(); index++) {
 long digit = Long.parseLong(integerText.substring(index, index + 1)));
 nextInteger = nextInteger.add(BigInteger.valueOf(digit));
}

That has the upside that it does not rely on the ASCII/UTF-8-Page (and knowledge of it), but has the downside of being more verbose. One could use the BigInteger constructor to create the BigInteger directly from the String, skipping the parse to long, but that would also skip the BigInteger cache which is used by valueOf.

Note that the (correct) code-point aware version would look like this:

for (int index = 0; index < integerText.length(); index += Character.charCount(integerText.codePointAt(index))) {
 Long digit = Long.parseLong(integerText.substring(index, index + Character.charCount(integerText.codePointAt(index)))));
 nextInteger = nextInteger.add(BigInteger.valueOf(digit));
}

That said, splitting the loop into two lines would add greatly readability:

 for (final char ch : integerText.toCharArray()) {
 long digit = (long)(ch - '0');
 nextInteger = nextInteger.add(BigInteger.valueOf(digit));
 }

Also, ch isn't that great of a name either.


 // Eighteen nines = 90 + 8 * 9 = 90 +たす 72 = 162, checksum must be 9:
 System.out.println(hash(BigInteger.valueOf(999999999999999999L)));

I know this is only an example, but this comment/code combination should either be a proper test, or the comment should also be printed to stdout.

mdfst13
22.4k6 gold badges34 silver badges70 bronze badges
answered Jul 2 at 10:00
\$\endgroup\$
4
  • 3
    \$\begingroup\$ While I don't prefer the recursion, it cannot be deep. I mean consider the numbers that could have 999999999999999999 (the example number here) as their sum-of-digits, they'd be petabytes in size already and that only gives us 1 extra level of recursion \$\endgroup\$ Commented Jul 2 at 11:06
  • \$\begingroup\$ That's true, didn't think that much about it. \$\endgroup\$ Commented Jul 2 at 11:43
  • \$\begingroup\$ If you want a recursive version which can go deep, use the hash(n) = hash(hash(n/10) + n%10) formula mentioned by Joop, which has one recursion level per decimal digit. \$\endgroup\$ Commented Jul 3 at 21:40
  • \$\begingroup\$ Why do we need a "codepoint-aware" version for parsing a string which is known to contain just decimal digits? \$\endgroup\$ Commented Jul 3 at 21:45

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.