I'm making an experimental Java algorithm that translates binary data from a file into a (very large) base ten integer. I am using BigInteger
since this number may have millions of digits. I am trying to translate this number into a sum of powers of 2.
I tried this code, which theoretically works, but is sluggishly slow. Using this code for a 200000 digit number takes about 5 minutes to complete with my i5 @4.5GHz. The execution time for a number with over a billion digits could take a day or more.
How could I optimize this code so that the execution time is significantly lowered?
byte[] bytes = Files.readAllBytes(outFile.toPath());
StringBuilder sb = new StringBuilder();
for (byte b : bytes) {
sb.append(String.format("%02X", b));
}
BigInteger dataInt = new BigInteger(sb.toString(), 16);
BigInteger remainder = dataInt;
double dataLength = dataInt.bitCount();
while (dataLength > BASE.bitCount()) {
int increment = (int) Math.round(Math.pow(dataLength, 1 / BASE.doubleValue()));
BigInteger base = BASE;
int exp = 2;
while (base.compareTo(remainder) < 0) {
base = BASE.pow(exp);
exp += increment;
}
while (base.compareTo(remainder) > 0) {
base = BASE.pow(exp);
exp--;
}
remainder = remainder.subtract(base);
dataLength = remainder.bitCount();
}
In this case, the constant BASE
is 2.
1 Answer 1
byte[] bytes = Files.readAllBytes(outFile.toPath());
Reserving the fully needed space ensures not using extra space, and no reallocations.
StringBuilder sb = new StringBuilder(bytes.length * 2);
for (byte b : bytes) {
sb.append(String.format("%02X", b));
}
BigInteger dataInt = new BigInteger(sb.toString(), 16);
Then however it would be faster to do:
BigInteger dataInt = new BigInteger(1, bytes);
with the caveat, that this is a big endian interpretation. Reverting the bytes would be fast too however.
The actual algorithm unfortunately uses floating point, but that is your choice. You could look at the sources of their implementation of toString(radix).
Also BigInteger has some interesting methods, like modPow
or remainder
.
Whether or not the loops can be optimized, pow multiplied by base, or the loop entirely eliminated. that is something interesting I leave upto you.
Explore related questions
See similar questions with these tags.
translates binary data ... into a ... base ten integer
translate this number into a sum of powers of 2
What is the goal? And what does200000 digit number
mean - 7526 bytes? Multi-word base conversion is costly if none of the bases is an integer multiple of the other. Did you try constructing a BigInteger from a byte array? How do you expectBigInteger
to help? \$\endgroup\$while
-loop supposed to accomplish? \$\endgroup\$