I am looking to submit the following code (adapted to fit into java.util
obviously) which significantly improves performance and reduces useless allocations of java.util.UUID
. Help me find the bugs and stupidity before I submit myself to the judgment of the JDK maintainers! :-)
benchmark instances Bytes ns linear runtime
JdkUuidFromString 51.00 1544.0 608.2 ==============================
NessUuidFromString 2.00 72.0 179.1 ========
JdkUuidToString 31.00 1720.0 321.4 ===============
NessUuidToString 3.00 200.0 51.5 ==
FromString gets a 3x speedup and 1/25th the object allocations. ToString gets a 6x speedup and 1/10th of the object allocations.
And here's the code:
/**
* A class that provides an alternate implementation of {@link
* UUID#fromString(String)} and {@link UUID#toString()}.
*
* <p> The version in the JDK uses {@link String#split(String)}
* which does not compile the regular expression that is used for splitting
* the UUID string and results in the allocation of multiple strings in a
* string array. We decided to write {@link NessUUID} when we ran into
* performance issues with the garbage produced by the JDK class.
*
*/
public class NessUUID {
private NessUUID() {}
private static final int NUM_ALPHA_DIFF = 'A' - '9' - 1;
private static final int LOWER_UPPER_DIFF = 'a' - 'A';
// FROM STRING
public static UUID fromString(String str) {
int dashCount = 4;
int [] dashPos = new int [6];
dashPos[0] = -1;
dashPos[5] = str.length();
for (int i = str.length()-1; i >= 0; i--) {
if (str.charAt(i) == '-') {
if (dashCount == 0) {
throw new IllegalArgumentException("Invalid UUID string: " + str);
}
dashPos[dashCount--] = i;
}
}
if (dashCount > 0) {
throw new IllegalArgumentException("Invalid UUID string: " + str);
}
long mostSigBits = decode(str, dashPos, 0) & 0xffffffffL;
mostSigBits <<= 16;
mostSigBits |= (decode(str, dashPos, 1) & 0xffffL);
mostSigBits <<= 16;
mostSigBits |= (decode(str, dashPos, 2) & 0xffffL);
long leastSigBits = (decode(str, dashPos, 3) & 0xffffL);
leastSigBits <<= 48;
leastSigBits |= (decode(str, dashPos, 4) & 0xffffffffffffL);
return new UUID(mostSigBits, leastSigBits);
}
private static long decode(final String str, final int [] dashPos, final int field) {
int start = dashPos[field]+1;
int end = dashPos[field+1];
if (start >= end) {
throw new IllegalArgumentException("Invalid UUID string: " + str);
}
long curr = 0;
for (int i = start; i < end; i++) {
int x = getNibbleFromChar(str.charAt(i));
curr <<= 4;
if (curr < 0) {
throw new NumberFormatException("long overflow");
}
curr |= x;
}
return curr;
}
static int getNibbleFromChar(final char c)
{
int x = c - '0';
if (x > 9) {
x -= NUM_ALPHA_DIFF; // difference between '9' and 'A'
if (x > 15) {
x -= LOWER_UPPER_DIFF; // difference between 'a' and 'A'
}
if (x < 10) {
throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
}
}
if (x < 0 || x > 15) {
throw new IllegalArgumentException(c + " is not a valid character for an UUID string");
}
return x;
}
// TO STRING
public static String toString(UUID uuid)
{
return toString(uuid.getMostSignificantBits(), uuid.getLeastSignificantBits());
}
/** Roughly patterned (read: stolen) from java.util.UUID and java.lang.Long. */
public static String toString(long msb, long lsb)
{
char[] uuidChars = new char[36];
digits(uuidChars, 0, 8, msb >> 32);
uuidChars[8] = '-';
digits(uuidChars, 9, 4, msb >> 16);
uuidChars[13] = '-';
digits(uuidChars, 14, 4, msb);
uuidChars[18] = '-';
digits(uuidChars, 19, 4, lsb >> 48);
uuidChars[23] = '-';
digits(uuidChars, 24, 12, lsb);
return new String(uuidChars);
}
private static void digits(char[] dest, int offset, int digits, long val) {
long hi = 1L << (digits * 4);
toUnsignedString(dest, offset, digits, hi | (val & (hi - 1)), 4);
}
private final static char[] DIGITS = {
'0' , '1' , '2' , '3' , '4' , '5' ,
'6' , '7' , '8' , '9' , 'a' , 'b' ,
'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
'o' , 'p' , 'q' , 'r' , 's' , 't' ,
'u' , 'v' , 'w' , 'x' , 'y' , 'z'
};
private static void toUnsignedString(char[] dest, int offset, int len, long i, int shift) {
int charPos = len;
int radix = 1 << shift;
long mask = radix - 1;
do {
dest[offset + --charPos] = DIGITS[(int)(i & mask)];
i >>>= shift;
} while (i != 0 && charPos > 0);
}
}
2 Answers 2
I could not get Caliper running, but hacked my own test:
My initial results were:
warmup
JdkFrom: 1787.38 JdkTo: 635.12 NessFrom: 460.15 NessTo: 183.67 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
Real Run
JdkFrom: 1415.68 JdkTo: 553.28 NessFrom: 426.29 NessTo: 94.69 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
JdkFrom: 1394.24 JdkTo: 387.14 NessFrom: 340.78 NessTo: 59.33 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
JdkFrom: 1378.38 JdkTo: 339.20 NessFrom: 325.73 NessTo: 59.20 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
JdkFrom: 1381.61 JdkTo: 334.28 NessFrom: 389.30 NessTo: 59.09 [-4552303853801426784, 69220000, -4552303853801426784, 69220000]
So, at face value, yes, your algorithm is nicely faster.
As for the code review, I have some comments:
fromString()
- I don't like that you ignore the required format for UUID's, essentially you say if it has 4 dashes it's cool, but, really, the number of digites between dashes is significant, and you ignore that.
- I feel that you should be calculating the long bits at the same time as you are validating and counting the dashes. Repeating the loops afterwards seems redundant.
- If you are looking for raw performance, a trick I have found out is that lookup tables make a big difference... I will show an example in a bit.
toString()
- I don't like the public
toString(long,long)
method. This is not 'symmetrical'. Only the toString(UUID) should be public. - The
DIGITS
code appears to be designed to satisfy many different radices (radixes, what's the plural?). This makes it a little bulky for this special case. - There are too many levels of method calls. It can be much shallower.
Consider:
I had a hack at this and decided I could do better.... consider the following results:
warmup
JdkFrom: 1929.14 JdkTo: 542.10 NessFrom: 270.43 NessTo: 175.71 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
Real Run
JdkFrom: 1569.85 JdkTo: 404.93 NessFrom: 249.37 NessTo: 45.94 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
JdkFrom: 1528.79 JdkTo: 279.55 NessFrom: 114.74 NessTo: 44.71 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
JdkFrom: 1657.85 JdkTo: 271.24 NessFrom: 118.20 NessTo: 44.43 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
JdkFrom: 1563.52 JdkTo: 273.69 NessFrom: 140.96 NessTo: 46.46 [2254274162472357232, 70459000, 2254274162472357232, 70459000]
This is almost three times faster than your version for the fromString, and another 0.2-times faster than your toString.
Here is the code that is (in my experience) about as fast as you can get with Java:
package uuid;
import java.util.Arrays;
import java.util.UUID;
/**
* A class that provides an alternate implementation of {@link
* UUID#fromString(String)} and {@link UUID#toString()}.
*
* <p> The version in the JDK uses {@link String#split(String)}
* which does not compile the regular expression that is used for splitting
* the UUID string and results in the allocation of multiple strings in a
* string array. We decided to write {@link NessUUID} when we ran into
* performance issues with the garbage produced by the JDK class.
*
*/
public class NessUUID {
private NessUUID() {}
// lookup is an array indexed by the **char**, and it has
// valid values set with the decimal value of the hex char.
private static final long[] lookup = buildLookup();
private static final int DASH = -1;
private static final int ERROR = -2;
private static final long[] buildLookup() {
long [] lu = new long[128];
Arrays.fill(lu, ERROR);
lu['0'] = 0;
lu['1'] = 1;
lu['2'] = 2;
lu['3'] = 3;
lu['4'] = 4;
lu['5'] = 5;
lu['6'] = 6;
lu['7'] = 7;
lu['8'] = 8;
lu['9'] = 9;
lu['a'] = 10;
lu['b'] = 11;
lu['c'] = 12;
lu['d'] = 13;
lu['e'] = 14;
lu['f'] = 15;
lu['A'] = 10;
lu['B'] = 11;
lu['C'] = 12;
lu['D'] = 13;
lu['E'] = 14;
lu['F'] = 15;
lu['-'] = DASH;
return lu;
}
// FROM STRING
public static UUID fromString(final String str) {
final int len = str.length();
if (len != 36) {
throw new IllegalArgumentException("Invalid UUID string (expected to be 36 characters long)");
}
final long[] vals = new long[2];
int shift = 60;
int index = 0;
for (int i = 0; i < len; i++) {
final int c = str.charAt(i);
if (c >= lookup.length || lookup[c] == ERROR) {
throw new IllegalArgumentException("Invalid UUID string (unexpected '" + str.charAt(i) + "' at position " + i + " -> " + str + " )");
}
if (lookup[c] == DASH) {
if ((i - 8) % 5 != 0) {
throw new IllegalArgumentException("Invalid UUID string (unexpected '-' at position " + i + " -> " + str + " )");
}
continue;
}
vals[index] |= lookup[c] << shift;
shift -= 4;
if (shift < 0) {
shift = 60;
index++;
}
}
return new UUID(vals[0], vals[1]);
}
// TO STRING
// recode is 2-byte arrays representing the hex representation of every byte value (all 256)
private static final char[][] recode = buildByteBlocks();
private static char[][] buildByteBlocks() {
final char[][] ret = new char[256][];
for (int i = 0; i < ret.length; i++) {
ret[i] = String.format("%02x", i).toCharArray();
}
return ret;
}
public static final String toString(final UUID uuid) {
long msb = uuid.getMostSignificantBits();
long lsb = uuid.getLeastSignificantBits();
char[] uuidChars = new char[36];
int cursor = uuidChars.length;
while (cursor > 24 ) {
cursor -= 2;
System.arraycopy(recode[(int)(lsb & 0xff)], 0, uuidChars, cursor, 2);
lsb >>>= 8;
}
uuidChars[--cursor] = '-';
while (cursor > 19) {
cursor -= 2;
System.arraycopy(recode[(int)(lsb & 0xff)], 0, uuidChars, cursor, 2);
lsb >>>= 8;
}
uuidChars[--cursor] = '-';
while (cursor > 14) {
cursor -= 2;
System.arraycopy(recode[(int)(msb & 0xff)], 0, uuidChars, cursor, 2);
msb >>>= 8;
}
uuidChars[--cursor] = '-';
while (cursor > 9) {
cursor -= 2;
System.arraycopy(recode[(int)(msb & 0xff)], 0, uuidChars, cursor, 2);
msb >>>= 8;
}
uuidChars[--cursor] = '-';
while (cursor > 0) {
cursor -= 2;
System.arraycopy(recode[(int)(msb & 0xff)], 0, uuidChars, cursor, 2);
msb >>>= 8;
}
return new String(uuidChars);
}
}
For your amusement, here's my test class (no Caliper, I know):
package uuid;
import java.util.Arrays;
import java.util.UUID;
public class PerformanceComparison
{
private final int N_UUIDS = 1000;
private final UUID[] testUuids = new UUID[N_UUIDS];
private final String[] testStrings = new String[N_UUIDS];
public void setup () {
for (int i = 0; i < N_UUIDS; i++)
{
testUuids[i] = UUID.randomUUID();
testStrings[i] = testUuids[i].toString();
}
}
public static void main(String[] args) {
PerformanceComparison pc = new PerformanceComparison();
final UUID uuidj = UUID.randomUUID();
String valj = uuidj.toString();
String valn = NessUUID.toString(uuidj);
UUID uuidn = NessUUID.fromString(valn);
if (!valj.equals(valn)) {
throw new IllegalStateException("Illegal conversion");
}
if (!uuidj.equals(uuidn)) {
throw new IllegalStateException("Illegal conversion");
}
pc.setup();
final int reps = 1000000;
System.out.println(" warmup");
pc.runAll(reps);
System.out.println(" Real Run");
pc.runAll(reps);
pc.runAll(reps);
pc.runAll(reps);
pc.runAll(reps);
}
private final void runAll(final int reps) {
long[] accum = new long[4];
System.out.printf(" JdkFrom: %6.2f JdkTo: %6.2f NessFrom: %6.2f NessTo: %6.2f %s\n",
timeJdkUuidFromString(reps, accum, 0) / 1000000.0,
timeJdkUuidToString(reps, accum, 1) / 1000000.0,
timeNessUuidFromString(reps, accum, 2) / 1000000.0,
timeNessUuidToString(reps, accum, 3) / 1000000.0,
Arrays.toString(accum));
}
public long timeJdkUuidFromString(int reps, long[] accum2, int j)
{
long accum = 0;
long start = System.nanoTime();
for (int i = 0; i < reps; i++)
{
accum += UUID.fromString(testStrings[i % N_UUIDS]).getMostSignificantBits();
}
accum2[j] = accum;
return System.nanoTime() - start;
}
public long timeJdkUuidToString(int reps, long[] accum2, int j)
{
long accum = 0;
long start = System.nanoTime();
for (int i = 0; i < reps; i++)
{
accum += testUuids[i % N_UUIDS].toString().charAt(0);
}
accum2[j] = accum;
return System.nanoTime() - start;
}
public long timeNessUuidFromString(int reps, long[] accum2, int j)
{
long accum = 0;
long start = System.nanoTime();
for (int i = 0; i < reps; i++)
{
accum += NessUUID.fromString(testStrings[i % N_UUIDS]).getMostSignificantBits();
}
accum2[j] = accum;
return System.nanoTime() - start;
}
public long timeNessUuidToString(int reps, long[] accum2, int j)
{
long accum = 0;
long start = System.nanoTime();
for (int i = 0; i < reps; i++)
{
accum += NessUUID.toString(testUuids[i % N_UUIDS]).charAt(0);
}
accum2[j] = accum;
return System.nanoTime() - start;
}
}
-
\$\begingroup\$ See my answer below for something even faster. Your logic in iterating through the string characters results in a lot of extra operations that impede performance. Also, use of the modulus operator hurts performance. It is a lot quicker to just hardcode all the indexes used in the UUID string. \$\endgroup\$user1585916– user15859162021年04月19日 15:34:31 +00:00Commented Apr 19, 2021 at 15:34
Improvements to the JDK have made the JDK's UUID.toString()
much more performant. It now (JDK 11) significantly outperforms any of these homegrown methods.
Unforntunately, the built-in UUID.fromString()
still performs pretty poorly. The OP and @rolfl's solutions are faster than the JDK's, but there is still room for improvement. Switching to a two char lookup table (1 byte) significantly improves performance (at the cost of some more memory). Two characters appears to be the sweet spot. In my tests, attempting to use a table for more than two hex characters at a time did not improve performance any further.
import java.util.Arrays;
import java.util.UUID;
/**
* Unfortunately, the built-in java UUID.fromString(name) is not very efficient.
* This method is approximately 80% faster. The only cost is the static lookup
* table, which consumes around 52K of memory. Useful for very high throughput
* applications.
*/
public class UUIDUtils {
// Lookup table of all possible byte values.
// Type of array needs to be short in order to be able to use -1 for invalid values.
// We can also use a byte array of nibble values (single hex character), but that
// is about 30% slower - although it does consume significantly less memory.
private static final short[] HEX_LOOKUP = new short[indexVal('f', 'f') + 1];
static {
Arrays.fill(HEX_LOOKUP, (short) -1);
char[] chars = {
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E', 'F', 'a', 'b', 'c', 'd', 'e', 'f' };
for (char c : chars) {
for (char c1 : chars) {
int i = indexVal(c, c1);
HEX_LOOKUP[i] = (short) (Character.digit(c, 16) << 4 | Character.digit(c1, 16));
}
}
}
public static UUID fromStringFast(String string) {
if (string.length() != 36) {
throw new IllegalArgumentException("Invalid length for UUID string: " + string + " :" + string.length());
}
long msb = toByte(string.charAt(0), string.charAt(1));
msb = msb << 8 | toByte(string.charAt(2), string.charAt(3));
msb = msb << 8 | toByte(string.charAt(4), string.charAt(5));
msb = msb << 8 | toByte(string.charAt(6), string.charAt(7));
checkDash(string.charAt(8));
msb = msb << 8 | toByte(string.charAt(9), string.charAt(10));
msb = msb << 8 | toByte(string.charAt(11), string.charAt(12));
checkDash(string.charAt(13));
msb = msb << 8 | toByte(string.charAt(14), string.charAt(15));
msb = msb << 8 | toByte(string.charAt(16), string.charAt(17));
checkDash(string.charAt(18));
long lsb = toByte(string.charAt(19), string.charAt(20));
lsb = lsb << 8 | toByte(string.charAt(21), string.charAt(22));
checkDash(string.charAt(23));
lsb = lsb << 8 | toByte(string.charAt(24), string.charAt(25));
lsb = lsb << 8 | toByte(string.charAt(26), string.charAt(27));
lsb = lsb << 8 | toByte(string.charAt(28), string.charAt(29));
lsb = lsb << 8 | toByte(string.charAt(30), string.charAt(31));
lsb = lsb << 8 | toByte(string.charAt(32), string.charAt(33));
lsb = lsb << 8 | toByte(string.charAt(34), string.charAt(35));
return new UUID(msb, lsb);
}
private static final int toByte(char hi, char low) {
try {
short b = HEX_LOOKUP[indexVal(hi, low)];
if (b == -1) {
throw new IllegalArgumentException("Invalid hex chars: " + hi + low);
}
return b;
} catch (IndexOutOfBoundsException e) {
throw new IllegalArgumentException("Invalid hex chars: " + hi + low);
}
}
private static final int indexVal(char hi, char low) {
return hi << 8 | low;
}
private static void checkDash(char c) {
if(c != '-') {
throw new IllegalArgumentException("Expected '-', but found '" + c + "'");
}
}
}
Even more performance can be squeezed out by removing all error checking. (Removing all error checking would also allow byte[]
to be used for the lookup table instead of short[]
, which also saves memory). This would increase throughput by about 25%. However, that is not advisable unless you can guarantee the validity of your input via some other means.
String.split()
has a "fastpath" improvement for single character patterns that don't include meta chars. In this case, splitting on '-' will take the fastpath. This fastpath avoids using the regex engine and improves performance in trivial spilts. Still, good to see these improvements. \$\endgroup\$toString(long msb, long lsb)
should be private. \$\endgroup\$