Revision 949db92b-a6e5-4eab-929c-cc2bc42f2010 - Code Review Stack Exchange

## Background

This question was brought to my attention in [The 2nd Monitor chat room](http://chat.stackexchange.com/transcript/message/36431281#36431281) because in the past I have claimed that using exception handling to handle parse exceptions is "a bad idea and slow". This is exactly what your code is doing, and it's a bad idea, and slow.... at least, that's what I thought, until I benchmarked your code.

Now, in the past, I wrote a CSV parser and it used a similar system to yours to handle the values in a field, and I discovered that I got a significant speed-up (like 100% faster) when I prevalidated the values to an large extent, ***before*** doing a `parseInt` or `parseDouble` on them. I found that it is much better to "identify" a value of a certain type to a high degree of confidence, and thus reduce the number of exceptions thrown.

In your code, if the values are 1/3 integers, 1/3 double, and 1/3 string, then on average you are creating 1 exception for each value (none for ints, 1 for doubles, and 2 for strings). Worst case, if all your values are strings, you'll create 2 exceptions per value.

What if you could ***(almost)*** guarantee that all your `parseInt` and `parseDouble` calls will succeed, and you'll have (almost) no exceptions? Is the work to check the value "worth it"?

My claim is ***yes***, it's worth it.

So, I have tried to prove it, and ... the results are interesting.

I used my MicroBench performance system to run the benchmark, and I built a dummy "load" for the `doFoo` function. Let's look at my test-rig:

 public class ParseVal {
 
 private final LongAdder intsums = new LongAdder();
 private final DoubleAdder doubsums = new DoubleAdder();
 private final LongAdder stringsums = new LongAdder();
 
 private final void doFoo(int val) {
 intsums.add(val);
 }
 
 private final void doFoo(double val) {
 doubsums.add(val);
 }
 
 private final void doFoo(String val) {
 stringsums.add(val.length());
 }
 
 @Override
 public String toString() {
 return String.format("IntSum %d - DoubleSum %.9f - StringLen %d", intsums.longValue(), doubsums.doubleValue(), stringsums.longValue());
 }
 
 public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
 ParseVal pv = new ParseVal();
 for (String v : data) {
 fn.accept(pv, v);
 }
 return pv.toString();
 }
 
 public static final String[] testData(int count) {
 String[] data = new String[count];
 Random rand = new Random(count);
 for (int i = 0; i < count; i++) {
 String base = String.valueOf(1000000000 - rand.nextInt(2000000000));
 switch(i % 3) {
 case 0:
 break;
 case 1:
 base += "." + rand.nextInt(10000);
 break;
 case 2:
 base += "foobar";
 break;
 }
 data[i] = base;
 }
 return data;
 }
 
 .......
 
 
 public void testStringOP(String val) { 
 String x = val.trim();
 try {
 int i = Integer.parseInt(x);
 doFoo(i);
 } catch (NumberFormatException e) {
 try {
 double d = Double.parseDouble(x);
 doFoo(d);
 } catch (NumberFormatException e2) {
 doFoo(x);
 }
 }
 }
 
 public static void main(String[] args) {
 String[] data = testData(1000);
 String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
 System.out.println(expect);
 
 ....
 }
 
 }

 
The `doFoo` methods have an accumulator mechanism (adding up ints, doubles, and the string lengths) and making the results available in a `toString` method. 

Also, I have put your function in there as `testStringOP`.

There is a `testData` function which builds an array if input strings where there are approximately equal numbers of int, double, and string values.

Finally, the benchmark function:

 public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
 ParseVal pv = new ParseVal();
 for (String v : data) {
 fn.accept(pv, v);
 }
 return pv.toString();
 }

That function takes an input function and the test data as an argument, and returns the String summary as a result. You would use this function like it's used in the main method....

 String expect = testFunction((pv, v) -> pv.testStringOP(v), data);

which runs the `testStringOP` function on all the input `data` values, and returns the accumulated string results.

What's nice is that I can now create other functions to test performance, for example `testStringMyFn` and call:

 String myresult = testFunction((pv, v) -> pv.testStringMyFn(v), data);

This is the basic tool I can use for the MicroBench system: https://github.com/rolfl/MicroBench

## Scanner option

Let's start by comparing your function to the `Scanner` type system recommended in another answer... Here's the code I used for the `Scanner`:

 public void testStringScanner(String val) {
 val = val.trim();
 try (Scanner scanner = new Scanner(val)) {
 if (scanner.hasNextInt()) {
 doFoo(scanner.nextInt());
 } else if (scanner.hasNextDouble()) {
 doFoo(scanner.nextDouble());
 } else {
 doFoo(val);
 }
 }
 }
 
and here's how I benchmarked that code:

 public static void main(String[] args) {
 String[] data = testData(1000);
 String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
 System.out.println(expect);
 
 UBench bench = new UBench("IntDoubleString Parser")
 .addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
 .addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
 bench.press(10).report("Warmup");
 bench.press(100).report("Final");
 }

That runs the benchmark on both your function, and the Scanner function, and does a warmup run (to get JIT optimzations done), and a "Final" run to get real results.... what are the results, you ask?

> Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
> Count : 100 Average : 1.6914
> Fastest : 1.5331 Slowest : 3.2561
> 95Pctile : 2.0277 99Pctile : 3.2561
> TimeBlock : 1.794 2.037 1.674 1.654 1.674 1.588 1.665 1.588 1.634 1.606
> Histogram : 99 1
> 
> Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
> Count : 100 Average : 69.9713
> Fastest : 67.2338 Slowest : 98.4322
> 95Pctile : 73.8073 99Pctile : 98.4322
> TimeBlock : 77.028 70.050 69.325 69.860 69.094 68.498 68.547 68.779 69.586 68.945
> Histogram : 100

What does that mean? It means, on average, your code is 40-times faster than the Scanner. Your code runs in 1.7Milliseconds to process 1000 input values, and the scanner runs in 70 milliseconds.

So, a `Scanner` is a bad idea if performance is required, right? I agree.

## Alternative

But, what about a RegEx pre-validation check? Note that the regex will not guarantee a clean parse, but it can go a long way. For example, the regex `[+-]?\d+` will match any integer, right, but is `-999999999999999999999` a valid integer? No, it's too big. But, it is a valid double. We will still need to have a try/catch block even if we pass the regex prevalidation. That's going to eliminate almost all exceptions, though.... 

So, what do we do to prevalidate things? Well, the [`Double.valueOf(String)` function](http://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#valueOf-java.lang.String-) documents a regex for matching double values in Strings. It's complicated, and I made a few modifications because we don't have already trimmed our inputs, but here's a couple of patterns for prevalidating double values, and integer values:


 private static final String Digits = "(\\p{Digit}+)";
 private static final String HexDigits = "(\\p{XDigit}+)";
 private static final String Exp = "[eE][+-]?"+Digits;
 private static final String fpRegex =
 ( //"[\\x00-\\x20]*"+ // Optional leading "whitespace"
 "[+-]?(" + // Optional sign character
 "NaN|" + // "NaN" string
 "Infinity|" + // "Infinity" string
 "((("+Digits+"(\\.)?("+Digits+"?)("+Exp+")?)|"+
 "(\\.("+Digits+")("+Exp+")?)|"+
 "((" +
 "(0[xX]" + HexDigits + "(\\.)?)|" +
 "(0[xX]" + HexDigits + "?(\\.)" + HexDigits + ")" +
 ")[pP][+-]?" + Digits + "))" +
 "[fFdD]?))"); // +
 //"[\\x00-\\x20]*");// Optional trailing "whitespace"
 
 Pattern isDouble = Pattern.compile(fpRegex);
 Pattern isInteger = Pattern.compile("[+-]?[0-9]+");

We can use those functions to build the code:

 public void testStringRegex(String val) { 
 String x = val.trim();
 if (isInteger.matcher(x).matches()) {
 try {
 doFoo(Integer.parseInt(x));
 } catch (NumberFormatException nfe) {
 try {
 doFoo(Double.parseDouble(x));
 } catch (NumberFormatException e) {
 doFoo(x);
 }
 }
 } else if (isDouble.matcher(x).matches()) {
 try {
 doFoo(Double.parseDouble(x));
 } catch (NumberFormatException e) {
 doFoo(x);
 }
 } else {
 doFoo(x);
 }
 }

Now, that's pretty complicated, right? Well, it does a "quick" integer regex check, and if it's likely an integer, it tries to parse it as an integer, and fails over to a double, and then to a string....

If it's not likely an integer, it checks if it's a double, and so on.....

How can this code be faster, you ask? Well, we're almost certainly having clean parses when we do them, and we'll have almost no exceptions... But, is it actually faster?

Here are the results:

> Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
> Count : 100 Average : 1.6689
> Fastest : 1.5580 Slowest : 2.1572
> 95Pctile : 1.8012 99Pctile : 2.1572
> TimeBlock : 1.695 1.752 1.709 1.670 1.641 1.648 1.643 1.639 1.662 1.630
> Histogram : 100
> 
> Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS)
> Count : 100 Average : 1.9580
> Fastest : 1.8379 Slowest : 2.5713
> 95Pctile : 2.1004 99Pctile : 2.5713
> TimeBlock : 1.978 2.022 1.949 1.966 2.020 1.933 1.890 1.940 1.955 1.928
> Histogram : 100
> 
> Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
> Count : 100 Average : 69.8886
> Fastest : 67.1848 Slowest : 77.2769
> 95Pctile : 71.9153 99Pctile : 77.2769
> TimeBlock : 70.940 69.735 69.879 69.381 69.579 69.180 69.611 70.412 70.123 70.045
> Histogram : 100

If you look, you'll see the regex version is ***Slower*** than the exception version... it runs in 1.95ms but the exception version runs in 1.67ms

## Exceptions

But, there's a catch. In these tests, the stack trace for the exceptions is really small... and the "cost" of an exception depends on the depth of the trace, so let's increase the stack depths for the regex and exception code. Well add a recursive function to simulate a deeper stack:


 public void testStringDeepOP(String val, int depth) {
 if (depth <= 0) {
 testStringOP(val);
 } else {
 testStringDeepOP(val, depth - 1);
 }
 }
 

 public void testStringDeepRegex(String val, int depth) {
 if (depth <= 0) {
 testStringRegex(val);
 } else {
 testStringDeepRegex(val, depth - 1);
 }
 }
 
and we will test the OP and Regex code a different "depths" of nesting, 5, 10, and 20 layers deep. The benchmark code is:

 UBench bench = new UBench("IntDoubleString Parser")
 .addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
 .addTask("OP D5", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 5), data), s -> expect.equals(s))
 .addTask("OP D10", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 10), data), s -> expect.equals(s))
 .addTask("OP D20", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 20), data), s -> expect.equals(s))
 .addTask("Regex", () -> testFunction((pv, v) -> pv.testStringRegex(v), data), s -> expect.equals(s))
 .addTask("Regex D5", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 5), data), s -> expect.equals(s))
 .addTask("Regex D10", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 10), data), s -> expect.equals(s))
 .addTask("Regex D20", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 20), data), s -> expect.equals(s))
 .addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
 bench.press(10).report("Warmup");
 bench.press(100).report("Final");


What are the results?

> Final
> =====
> 
> Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
> Count : 100 Average : 1.7005
> Fastest : 1.5260 Slowest : 3.9813
> 95Pctile : 1.9346 99Pctile : 3.9813
> TimeBlock : 1.682 1.624 1.612 1.675 1.708 1.658 1.727 1.738 1.672 1.910
> Histogram : 99 1
> 
> Task IntDoubleString Parser -> OP D5: (Unit: MILLISECONDS)
> Count : 100 Average : 1.9288
> Fastest : 1.7325 Slowest : 4.9673
> 95Pctile : 2.0897 99Pctile : 4.9673
> TimeBlock : 2.124 1.812 1.828 1.873 1.925 1.877 1.855 1.869 1.903 2.221
> Histogram : 98 2
> 
> Task IntDoubleString Parser -> OP D10: (Unit: MILLISECONDS)
> Count : 100 Average : 2.2271
> Fastest : 2.0171 Slowest : 4.7395
> 95Pctile : 2.4904 99Pctile : 4.7395
> TimeBlock : 2.392 2.125 2.129 2.152 2.246 2.169 2.189 2.203 2.247 2.420
> Histogram : 98 2
> 
> Task IntDoubleString Parser -> OP D20: (Unit: MILLISECONDS)
> Count : 100 Average : 2.9278
> Fastest : 2.6838 Slowest : 6.3169
> 95Pctile : 3.2415 99Pctile : 6.3169
> TimeBlock : 2.870 2.822 2.860 2.794 2.956 2.861 3.041 3.012 2.853 3.211
> Histogram : 99 1
> 
> Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS)
> Count : 100 Average : 2.0739
> Fastest : 1.9338 Slowest : 3.8368
> 95Pctile : 2.2744 99Pctile : 3.8368
> TimeBlock : 2.229 2.083 2.034 2.013 2.021 2.004 2.013 2.096 2.059 2.186
> Histogram : 100
> 
> Task IntDoubleString Parser -> Regex D5: (Unit: MILLISECONDS)
> Count : 100 Average : 2.0565
> Fastest : 1.9377 Slowest : 3.2857
> 95Pctile : 2.2646 99Pctile : 3.2857
> TimeBlock : 2.148 2.075 2.035 2.038 2.035 2.031 2.026 2.000 2.032 2.145
> Histogram : 100
> 
> Task IntDoubleString Parser -> Regex D10: (Unit: MILLISECONDS)
> Count : 100 Average : 2.0647
> Fastest : 1.9598 Slowest : 2.6360
> 95Pctile : 2.2906 99Pctile : 2.6360
> TimeBlock : 2.073 2.094 2.051 2.048 2.072 2.029 2.057 2.124 2.057 2.042
> Histogram : 100
> 
> Task IntDoubleString Parser -> Regex D20: (Unit: MILLISECONDS)
> Count : 100 Average : 2.0891
> Fastest : 1.9930 Slowest : 2.6483
> 95Pctile : 2.2587 99Pctile : 2.6483
> TimeBlock : 2.108 2.070 2.078 2.066 2.071 2.091 2.048 2.090 2.137 2.132
> Histogram : 100
> 
> Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
> Count : 100 Average : 71.7199
> Fastest : 67.9621 Slowest : 152.0714
> 95Pctile : 75.2141 99Pctile : 152.0714
> TimeBlock : 71.006 69.896 70.160 69.734 70.824 69.854 71.473 71.888 73.607 78.756
> Histogram : 99 1

Here it is expressed as a table (using the average times):

> 0 5 10 20
> OP 1.7005 1.9288 2.2271 2.9278
> RegEx 2.0739 2.0565 2.0647 2.0891

## Conclusion

So, that's the real problem with exceptions, the performance is unpredictable... and, for example, if you run it inside a Tomcat container, with stacks hundreds of levels deep, you may find this completely destroys your performance.

AltStyle によって変換されたページ (->オリジナル) /